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FIELD AND BACKGROUND OF THE INVENTION 

The present invention relates to a polynucleotide, referred to 
hereinbelow as hpa, encoding a polypeptide having heparanase activity, 
vectors (nucleic acid constructs) including same and genetically modified 
30 cells expressing heparanase. The invention further relates to a recombinant 
protein having heparanase activity and to antisense oligonucleotides, 
constructs and ribozymes for down regulating heparanase activity. In 
addition, the invention relates to heparanase promoter sequences and their 
uses. 
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Heparan sulfate proteoglycans: Heparan sulfate proteoglycans 
(HSPG) are ubiquitous macromolecules associated with the cell surface and 
extra cellular matrix (ECM) of a wide range of cells of vertebrate' and 
invertebrate tissues (1-4). The basic HSPG structure includes a protein core 
5 to which several linear heparan sulfate chains are covalently attached. 
These polysaccharide chains are typically composed of repeating hexuronic 
and D-glucosamine disaccharide units that are substituted to a varying 
extent with N- and Olinked sulfate moieties and N-linked acetyl groups 
(1-4). Studies on the involvement of ECM molecules in cell attachment, 

10 growth and differentiation revealed a central role of HSPG in embryonic 
morphogenesis, angiogenesis, neurite outgrowth and tissue repair (1-5). 
HSPG are prominent components of blood vessels (3). In large blood 
vessels they are concentrated mostly in the intima and inner media, whereas 
in capillaries they are found mainly in the subendothelial basement 

15 membrane where they support proliferating and migrating endothelial cells 
and stabilize the structure of the capillary wall. The ability of HSPG to 
interact with ECM macromolecules such as collagen, laminin and 
fibronectin, and with different attachment sites on plasma membranes 
suggests a key role for this proteoglycan in the self-assembly and 

20 insolubility of ECM components, as well as in cell adhesion and 
locomotion. Cleavage of the heparan sulfate (HS) chains may therefore 
result in degradation of the subendothelial ECM and hence may play a 



decisive role in extravasation of blood-borne cells. HS catabolism is 
observed in inflammation, wound repair, diabetes, and cancer metastasis, 
suggesting that enzymes which degrade HS play important roles in 
pathologic processes. Heparanase activity has been described in activated 
immune system cells and highly metastatic cancer cells (6-8), but research 
has been handicapped by the lack of biologic tools to explore potential 
causative roles of heparanase in disease conditions. 

Involvement of Heparanase in Tumor Cell Invasion and 
Metastasis: Circulating tumor cells arrested in the capillary beds of 
different organs must invade the endothelial cell lining and degrade its 
underlying basement membrane (BM) in order to invade into the 
extravascular tissue(s) where they establish metastasis (9, 10). Metastatic 
tumor cells often attach at or near the intercellular junctions between 
adjacent endothelial cells. Such attachment of the metastatic cells is 
followed by rupture of the junctions, retraction of the endothelial cell 
borders and migration through the breach in the endothelium toward the 
exposed underlying BM (9). Once located between endothelial cells and the 
BM, the invading cells must degrade the subendothelial glycoproteins and 
proteoglycans of the BM in order to migrate out of the vascular 
compartment. Several cellular enzymes (e.g., collagenase IV, plasminogen 
activator, cathepsin B, elastase, etc.) are thought to be involved in 
degradation of BM (10). Among these enzymes is an endo-p 



-D-glucuronidase (heparanase) that cleaves HS at specific intrachain sites 
(6, 8, 11). Expression of a HS degrading heparanase was found to correlate 
with the metastatic potential of mouse lymphoma (11), fibrosarcoma and 
melanoma (8) cells. Moreover, elevated levels of heparanase were detected 
in sera from metastatic tumor bearing animals and melanoma patients (8) 
and in tumor biopsies of cancer patients (12). 

The control of cell proliferation and tumor progression by the local 
microenvironment, focusing on the interaction of cells with the 
extracellular matrix (ECM) produced by cultured corneal and vascular 
endothelial cells, was investigated previously by the present inventors. This 
cultured ECM closely resembles the subendothelium in vivo in its 
morphological appearance and molecular composition. It contains 
collagens (mostly type III and IV, with smaller amounts of types I and V), 
proteoglycans (mostly heparan sulfate- and dermatan sulfate- proteoglycans, 
with smaller amounts of chondroitin sulfate proteoglycans), laminin, 
fibronectin, entactin and elastin (13, 14). The ability of cells to degrade HS 
in the cultured ECM was studied by allowing cells to interact with a 
metabolically sulfate labeled ECM, followed by gel filtration (Sepharose 
6B) analysis of degradation products released into the culture medium (11). 
While intact HSPG are eluted next to the void volume of the column 
(Kav<0.2, Mr ~ 0.5xl()6) ? labeled degradation fragments of HS side chains 



are eluted more toward the Vf of the column (0.5<kav<0.8, Mr =5-7xl0 3 ) 

do. 

The heparanase inhibitory effect of various non-anticoagulant 
species of heparin that might be of potential use in preventing extravasation 
of blood-borne cells was also investigated by the present inventors. 
Inhibition of heparanase was best achieved by heparin species containing 16 
sugar units or more and having sulfate groups at both the N and O positions: 
While O-desulfation abolished the heparanase inhibiting effect of heparin, 
O-sulfated, N-acetylated heparin retained a high inhibitory activity, 
provided that the N-substituted molecules had a molecular size of about 
4,000 daltons or more (7). Treatment of experimental animals with 
heparanase inhibitors (e.g., non-anticoagulant species of heparin) markedly 
reduced (>90%) the incidence of lung metastases induced by B16 
melanoma, Lewis lung carcinoma and mammary adenocarcinoma cells (7, 
8, 16). Heparin fractions with high and low affinity to anti-thrombin III 
exhibited a comparable high anti-metastatic activity, indicating that the 
heparanase inhibiting activity of heparin, rather than its anticoagulant 
activity, plays a role in the anti-metastatic properties of the polysaccharide 
(7). 

Heparanase activity in the urine of cancer patients: In an attempt 
to further elucidate the involvement of heparanase in tumor progression 
and its relevance to human cancer, urine samples for heparanase activity 



were screened (16a). Heparanase activity was detected in the urine of some, 
but not all, cancer patients. High levels of heparanase activity were 
determined in the urine of patients with an aggressive metastatic disease and 
there was no detectable activity in the urine of healthy donors. 

Heparanase activity was also found in the urine of 20% of normal 
and microalbuminuric insulin dependent diabetes mellitus (IDDM) patients, 
most likely due to diabetic nephropathy, the most important single disorder 
leading to renal failure in adults. 

Possible involvement of heparanase in tumor angiogenesis: 
Fibroblast growth factors are a family of structurally related polypeptides 
characterized by high affinity to heparin (17). They are highly mitogenic 
for vascular endothelial cells and are among the most potent inducers of 
neovascularization (17, 18). Basic fibroblast growth factor (bFGF) has been 
extracted from the subendothelial ECM produced in vitro (19) and from 
basement membranes of the cornea (20), suggesting that ECM may serve as 
a reservoir for bFGF. Immunohistochemical staining revealed the 
localization of bFGF in basement membranes of diverse tissues and blood 
vessels (21). Despite the ubiquitous presence of bFGF in normal tissues, 
endothelial cell proliferation in these tissues is usually very low, suggesting 
that bFGF is somehow sequestered from its site of action. Studies on the 
interaction of bFGF with ECM revealed that bFGF binds to HSPG in the 
ECM and can be released in an active form by HS degrading enzymes (15, 



20, 22). It was demonstrated that heparanase activity expressed by platelets, 
mast cells, neutrophils, and lymphoma cells is involved in release of active 
bFGF from ECM and basement membranes (23), suggesting that 
heparanase activity may not only function in cell migration and invasion, 

5 but may also elicit an indirect neovascular response. These results suggest 
that the ECM HSPG provides a natural storage depot for bFGF and possibly 
other heparin-binding growth promoting factors (24, 25). Displacement of 
bFGF from its storage within basement membranes and ECM may therefore 
provide a novel mechanism for induction of neovascularization in normal 

10 and pathological situations. 

Recent studies indicate that heparin and HS are involved in binding 
of bFGF to high affinity cell surface receptors and in bFGF cell signaling 
(26, 27). Moreover, the size of HS required for optimal effect was similar 
to that of HS fragments released by heparanase (28). Similar results were 

15 obtained with vascular endothelial cells growth factor (VEGF) (29), 
suggesting the operation of a dual receptor mechanism involving HS in cell 
interaction with heparin-binding growth factors. It is therefore proposed 
that restriction of endothelial cell growth factors in ECM prevents their 
systemic action on the vascular endothelium, thus maintaining a very low 

20 rate of endothelial cells turnover and vessel growth. On the other hand, 
release of bFGF from storage in ECM as a complex with HS fragment, may 
elicit localized endothelial cell proliferation and neovascularization in 
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processes such as wound healing, inflammation and tumor development (24, 
25). 

Expression of heparanase by cells of the immune system: 
Heparanase activity correlates with the ability of activated cells of the 
immune system to leave the circulation and elicit both inflammatory and 
autoimmune responses. Interaction of platelets, granulocytes, T and B 
lymphocytes, macrophages and mast cells with the subendothelial ECM is 
associated with degradation of HS by a specific heparanase activity (6). 
The enzyme is released from intracellular compartments (e.g., lysosomes, 
specific granules, etc.) in response to various activation signals (e.g., 
thrombin, calcium ionophore, immune complexes, antigens, mitogens, etc.), 
suggesting its regulated involvement in inflammation and cellular immunity. 

Some of the observations regarding the heparanase enzyme were 
reviewed in reference No. 6 and are listed hereinbelow: 

First, a proteolytic activity 7 (plasminogen activator) and heparanase 
participate synergistically in sequential degradation of the ECM HSPG by 
inflammatory leukocytes and malignant cells. 

Second, a large proportion of the platelet heparanase exists in a latent 
form, probably as a complex with chondroitin sulfate. The latent enzyme is 
activated by tumor cell-derived factor(s) and may then facilitate cell 
invasion through the vascular endothelium in the process of tumor 
metastasis. 
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Third, release of the platelet heparanase from o-granules is induced 
by a strong stimulant (i.e., thrombin), but not in response to platelet 
activation on ECM. * 

Fourth, the neutrophil heparanase is preferentially and readily 
5 released in response to a threshold activation and upon incubation of the 
cells on ECM. 

Fifth, contact of neutrophils with ECM inhibited release of noxious 
enzymes (proteases, lysozyme) and oxygen radicals, but not of enzymes 
(heparanase, gelatinase) which may enable diapedesis. This protective role 
10 of the subendothelial ECM was observed when the cells were stimulated 
with soluble factors but not with phagocytosable stimulants. 

Sixth, intracellular heparanase is secreted within minutes after 
exposure of T cell lines to specific antigens. 

Seventh, mitogens (Con A, LPS) induce synthesis and secretion of 
15 heparanase by normal T and B lymphocytes maintained in vitro. T 
lymphocyte heparanase is also induced by immunization with antigen in 
vivo. 

Eighth, heparanase activity is expressed by pre-B lymphomas and 
B-lymphomas, but not by plasmacytomas and resting normal B 
20 lymphocytes. 

Ninth, heparanase activity is expressed by activated macrophages 
during incubation with ECM, but there was little or no release of the 
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enzyme into the incubation medium. Similar results were obtained with 
human myeloid leukemia cells induced to differentiate to mature 
macrophages. 

Tenth, T-cell mediated delayed type hypersensitivity and 
experimental autoimmunity are suppressed by low doses of heparanase 
inhibiting non-anticoagulant species of heparin (30). 

Eleventh, heparanase activity expressed by platelets, neutrophils and 
metastatic tumor cells releases active bFGF from ECM and basement 
membranes. Release of bFGF from storage in ECM may elicit a localized 
neovascular response in processes such as wound healing, inflammation and 
tumor development. 

Twelfth, among the breakdown products of the ECM generated by 
heparanase is a tri-sulfated disaccharide that can inhibit T-cell mediated 
inflammation in vivo (31). This inhibition was associated with an inhibitory 
effect of the disaccharide on the production of biologically active TNFo by 
activated T cells in vitro (31). 

Other potential therapeutic applications: Apart from its 
involvement in tumor cell metastasis, inflammation and autoimmunity, 
mammalian heparanase may be applied to modulate: bioavailability of 
heparin-binding growth factors (15); cellular responses to heparin-binding 
growth factors (e.g., bFGF, VEGF) and cytokines (IL-8) (31a, 29); cell 
interaction with plasma lipoproteins (32); cellular susceptibility to certain 



viral and some bacterial and protozoa infections (33, 33a, 33b); and 
disintegration of amyloid plaques (34). Heparanase may thus prove useful 
for conditions such as 'wound healing, ■ angiogenesis, restenosis, 
atherosclerosis, inflammation, neurodegenerative diseases and viral 

5 infections. Mammalian heparanase can be used to neutralize plasma 
heparin, as a potential replacement of protamine. Anti-heparanase 
antibodies may be applied for immunodetection and diagnosis of 
micrometastases, autoimmune lesions and renal failure in biopsy specimens, 
plasma samples, and body fluids. Common use in basic research is 

10 expected. 

The identification of the hpa gene encoding for heparanase enzyme 
will enable the production of a recombinant enzyme in heterologous 
expression systems. Availability of the recombinant protein will pave the 
way for solving the protein structure function relationship and will provide a 

15 tool for developing new inhibitors. 

Viral Infection: The presence of heparan sulfate on cell surfaces 
have been shown to be the principal requirement for the binding of Herpes 
Simplex (33) and Dengue (33a) viruses to cells and for subsequent infection 
of the cells. Removal of the cell surface heparan sulfate by heparanase may 

20 therefore abolish virus infection. In fact, treatment of cells with bacterial 
heparitinase (degrading heparan sulfate) or heparinase (degrading heparan) 
reduced the binding of two related animal herpes viruses to cells and 
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rendered the cells at least partially resistant to virus infection (33). There 
are some indications that the cell surface heparan sulfate is also involved in 
HIV infection (33b). 

Neurodegenerative diseases: Heparan sulfate proteoglycans were 
identified in the prion protein amyloid plaques of Genstmann-Straussler 
Syndrome, Creutzfeldt-Jakob disease and Scrape (34). Heparanase may 
disintegrate these amyloid plaques which are also thought to play a role in 
the pathogenesis of Alzheimer's disease. 

Restenosis and Atherosclerosis: Proliferation of arterial smooth 
muscle cells (SMCs) in response to endothelial injury and accumulation of 
cholesterol rich lipoproteins are basic events in the pathogenesis of 
atherosclerosis and restenosis (35). Apart from its involvement in SMC 
proliferation (i.e., low affinity receptors for heparin-binding growth factors), 
HS is also involved in lipoprotein binding, retention and uptake (36). It was 
demonstrated that HSPG and lipoprotein lipase participate in a novel 
catabolic pathway that may allow substantial cellular and interstitial 
accumulation of cholesterol rich lipoproteins (32). The latter pathway is 
expected to be highly atherogenic by promoting accumulation of apoB and 
apoE rich lipoproteins (i.e. LDL, VLDL, chylomicrons), independent of 
feed back inhibition by the cellular sterol content. Removal of SMC HS by 
heparanase is therefore expected to inhibit both SMC proliferation and lipid 



accumulation and thus may halt the progression of restenosis and 
atherosclerosis. 

Gene therapy: 

The ultimate goal in the management of inherited as well as acquired 

5 diseases is a rational therapy with the aim to eliminate the underlying 
biochemical defects associated with the disease rather then symptomatic 
treatment. Gene therapy is a promising candidate to meet these objectives. 
Initially it was developed for treatment of genetic disorders, however, the 
consensus view today is that it offers the prospect of providing therapy for a 

10 variety of acquired diseases, including cancer, viral infections, vascular 
diseases and neurodegenerative disorders. 

The gene-based therapeutic can act either intracellularly, affecting 
only the cells to which it is delivered, or extracellularly, using the recipient 
cells as local endogenous factories for the therapeutic product(s). The 

15 application of gene therapy may follow any of the following strategies: (i) 
prophylactic gene therapy, such as using gene transfer to protect cells 
against viral infection; (ii) cytotoxic gene therapy, such as cancer therapy, 
where genes encode cytotoxic products to render the target cells vulnerable 
to attack by the normal immune response; (iii) biochemical correction, 

20 primarily for the treatment of single gene defects, where a normal copy of 
the gene is added to the affected or other cells. 
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To allow efficient transfer of the therapeutic genes, a variety of gene 
delivery techniques have been developed based on viral and non-viral 
vector systems. The most widely used and most efficient systems for 
delivering genetic material into target cells are viral vectors. So far, 329 
clinical studies (phase I, I/II and II) with over 2,500 patients have been 
initiated Worldwide since 1989 (50). 

The approach of gene addition pose serious barriers. The expression 
of many genes is tightly regulated and context dependent, so achieving the 
correct balance and function of expression is challenging. The gene itself is 
often quite large, containing many exons and introns. The delivery vector is 
usually a virus, which can infect with a high efficiency but may, on the 
other hand, induce immunological response and consequently decreases 
effectiveness, especially upon secondary administration. Most of the 
current expression vector-based gene therapy protocols fail to achieve 
clinically significant transgene expression required for treating genetic 
diseases. Apparently, it is difficult to deliver enough virus to the right cell 
type to elicit an effective and therapeutic effect (51) 

Homologous recombination, which was initially considered to be of 
limited use for gene therapy because of its low frequency in mammalian 
cells, has recently emerged as a potential strategy for developing gene 
therapy. Different approaches have been used to study homologous 
recombination in mammalian cells; some involve DNA repair mechanisms. 
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These studies aimed at either gene disruption or gene correction and include 
RNA/DNA chimeric oligonucleotides, small or large homologous DNA 
fragments, or adeno-associated viral vectors. Most of these studies show a 
reasonable frequency of homologous recombination, which warrants further 
in vivo testing (52). Homologous recombination-based gene therapy has the 
potential to develop into a powerful therapeutic modality for genetic 
diseases. It can offer permanent expression and normal regulation of 
corrected genes in appropriate cells or organs and probably can be used for 
treating dominantly inherited diseases such as polycystic kidney disease. 
Genomic sequences function in regulation of gene expression: 
The efficient expression of therapeutic genes in target cells or tissues 
is an important component of efficient and safe gene therapy. The 
expression of genes is driven by the promoter region upstream of the coding 
sequence, although regulation of expression may be supplemented by 
farther upstream or downstream DNA sequences or DNA in the introns of 
the gene. Since this important information is embedded in the DNA, the 
description of gene structure is crucial to the analysis of gene regulation. 
Characterization of cell specific or tissue specific promoters, as well as 
other tissue specific regulatory elements enables the use of such sequences 
to direct efficient cell specific, or developmental stage specific gene 
expression. This information provides the basis for targeting individual 
genes and for control of their expression by exogenous agents, such as 



drugs. Identification of transcription factors and other regulatory proteins 
required for proper gene expression will point at new potential targets for 
modulating gene expression, when so desired or required. 

Efficient expression of many mammalian genes depends on the 

5 presence of at least one intron. The expression of mouse thymidylate 
synthase (TS) gene, for example, is greatly influenced by intron sequences. 
The addition of almost any of the introns from the mouse TS gene to an 
intronless TS minigene leads to a large increase in expression (42). The 
involvement of intron 1 in the regulation of expression was demonstrated 

10 for many other genes. In human factor IX (hFIX), intron 1 is able to 
increase the expression level about 3 fold mare as compared to that of the 
hFIX cDNA (43). The expression enhancing activity of intron 1 is due to 
efficient functional splicing sequences, present in the precursor mRNA. By 
being efficiently assembled into spliceosome complexes, transcripts with 

15 splicing sequences may be better protected in the nucleus from random 
degradations, than those without such sequences (44). 

A forward-inserted intron 1 -carrying hFIX expression cassette 
suggested to be useful for directed gene transfer, while for 
retroviral-mediated gene transfer system, reversely-inserted intron 

20 1 -carrying hFIX expression cassette was considered (43). 

A highly conserved cis-acting sequence element was identified in the 
first intron of the mouse and rat c-Ha-ras, and in the first exon of Ha- and 
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Ki-ras genes of human, mouse and rat. This cis-acting regulatory sequence 
confers strong transcription enhancer activity that is differentially 
modulated by steroid hormones in metastatic and nonmetastatic 
subpopulations. Perturbations in the regulatory activities of such cis-acting 

5 sequences may play an important role in governing oncogenic potency of 
Ha-ras through transcriptional control mechanisms (45). 

Intron sequences affect tissue specific, as well as inducible gene 
expression. A 182 bp intron 1 DNA segment of the mouse Col2al gene 
contains the necessary information to confer high-level, temporally correct, 

io chondrocyte expression on a reporter gene in intact mouse embryos, while 
Col2al promoter sequences are dispensable for chondrocyte expression 
(46). In CollAl gene the intron plays little or no role in constitutive 
expression of collagen in the skin, and in cultured cells derived from the 
skin, however, in the lungs of young mice, intron deletion results in 

15 decrease of expression to less than 50 % (47). 

A classical enhancer activity was shown in the 2 kb intron fragment 
in bovine beta-casein gene. The enhancer activity was largely dependent on 
the lactogenic hormones, especially prolactin. It was suggested that several 
elements in the intron- 1 of the bovine beta-casein gene cooperatively 

20 interact not only with each other but also with its promoter for hormonal 
induction (48). 
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Identification and characterization of regulatory elements in genomic 
non-coding sequences, such as introns, provides a tool for designing and 
constructing novel vectors for tissue specific, hormone regulated or any 
other defined expression pattern, for gene therapy. Such an expression 
cassette was developed, utilizing regulatory elements from the human 
cytokeratin 18 (K18) gene, including 5 f genomic sequences and one of its 
introns. This cassette efficiently expresses reporter genes, as well as the 
human cystic fibrosis transmembrane conductance regulator (CFTR) gene, 
in cultured lung epithelial cells (49). 

Alternative splicing: 

Alternative splicing of pre mRNA is a powerful and versatile 
regulatory mechanism that can effect quantitative control of gene expression 
and functional diversification of proteins. It contributes to major 
developmental decisions and also to a fine-tuning of gene function. Genetic 
and biochemical approaches have identified cis-acting regulatory 7 elements 
and trans-acting factors that control alternative splicing of specific mRNAs. 
This mechanism results in the generation of variant isoforms of various 
proteins from a single gene. These include cell surface molecules such as 
CD44, receptors, cytokines such as VEGF and enzymes. Products of 
alternatively spliced transcripts differ in their expression pattern, substrate 
specificity and other biological parameters. 
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The FGF receptor RNA undergoes alternative splicing which results 
in the production of several isoforms, which exhibit different ligand binding 
specificities. The alternative splicing is regulated in a cell specific manner 
(53). 

Alternative spliced mRNAs are often correlated with malignancy. 
An increase in specific splice variant of tyrosinase was identified in murine 
melanomas (54). Multiple splicing variants of estrogen receptor are present 
in individual human breast tumors. CD44 has various isoform, some are 
characteristic of malignant tissues. 

Identification of tumor specific alternative splice variants provide 
new tool for cancer diagnostics. CD44 variants have been used for 
detection of malignancy in urine samples from patients with urothelial 
cancer by competitive RT-PCR (55). CD44 exon 6 was suggested as 
prognostic indicator of metastasis in breast cancer (56). 

Different enzymes or polypeptides generated by alternative splicing 
may have different function or catalytic specificity. The identification and 
characterization of the enzyme forms, which are involved in pathological 
processes, is crucial for the design of appropriate and efficient drugs. 

Modulation of gene expression - Antisense technology: 

An antisense oligonucleotide (e.g., antisense 

oligodeoxyribonucleotide) may bind its target nucleic acid either by 
Watson-Crick base pairing or Hoogsteen and anti-Hoogsteen base pairing 



(64). According to the Watson-Crick base pairing, heterocyclic bases of the 
antisense oligonucleotide form hydrogen bonds with the heterocyclic bases 
of target single-stranded nucleic acids (RNA or single-stranded DNA), 
whereas according to the Hoogsteen base pairing, the heterocyclic bases of 

5 the target nucleic acid are double-stranded DNA, wherein a third strand is 
accommodated in the major groove of the B-form DNA duplex by 
Hoogsteen and anti-Hoogsteen base pairing to form a triple helix structure. 

According to both the Watson-Crick and the Hoogsteen base pairing 
models, antisense oligonucleotides have the potential to regulate gene 

10 expression and to disrupt the essential functions of the nucleic acids in cells. 
Therefore, antisense oligonucleotides have possible uses in modulating a 
wide range of diseases in which gene expression is altered. 

Since the development of effective methods for chemically 
synthesizing oligonucleotides, these molecules have been extensively used 

15 in biochemistry and biological research and have the potential use in 
medicine, since carefully devised oligonucleotides can be used to control 
gene expression by regulating levels of transcription, transcripts and/or 
translation. 

Oligodeoxyribonucleotides as long as 100 base pairs (bp) are 
20 routinely synthesized by solid phase methods using commercially available, 
fully automated synthesis machines. The chemical synthesis of 
oligoribonucleotides, however, is far less routine. Oligoribonucleotides are 
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also much less stable than oligodeoxyribonucleotides, a fact which has 
contributed to the more prevalent use of oligodeoxyribonucleotides in 
medical and biological research, directed at, for example, the regulation of 
transcription or translation levels. 

Gene expression involves few distinct and well regulated steps. The 
first major step of gene expression involves transcription of a messenger 
RNA (mRNA) which is an RNA sequence complementary to the antisense 
(i.e., -) DNA strand, or, in other words, identical in sequence to the DNA 
sense (i.e., +) strand, composing the gene. In eukaryotes, transcription 
occurs in the cell nucleus. 

The second major step of gene expression involves translation of a 
protein (e.g., enzymes, structural proteins, secreted proteins, gene 
expression factors, etc.) in which the mRNA interacts with ribosomal RNA 
complexes (ribosomes) and amino acid activated transfer RNAs (tRNAs) to 
direct the synthesis of the protein coded for by the mRNA sequence. 

Initiation of transcription requires specific recognition of a promoter 
DNA sequence located upstream to the coding sequence of a gene by an 
RNA-synthesizing enzyme — RNA polymerase. This recognition is 
preceded by sequence-specific binding of one or more transcription factors 
to the promoter sequence. Additional proteins which bind at or close to the 
promoter sequence may trans upregulate transcription via cis elements 
known as enhancer sequences. Other proteins which bind to or close to the 
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promoter, but whose binding prohibits the action of RNA polymerase, are 
known as repressors. 

There are also evidence that in some cases gene expression is 
downregulated by endogenous antisense RNA repressors that bind a 
complementary mRNA transcript and thereby prevent its translation into a 
functional protein. 

Thus, gene expression is typically upregulated by transcription 
factors and enhancers and downregulated by repressors. 

However, in many disease situation gene expression is impaired. In 
many cases, such as different types of cancer, for various reasons the 
expression of a specific endogenous or exogenous (e.g., of a pathogen such 
as a virus) gene is upregulated. Furthermore, in infectious diseases caused 
by pathogens such as parasites, bacteria or viruses, the disease progression 
depends on expression of the pathogen genes, this phenomenon may also be 
considered as far as the patient is concerned as upregulation of exogenous 
genes. 

Most conventional drugs function by interaction with and modulation 
of one or more targeted endogenous or exogenous proteins, e.g., enzymes. 
Such drugs, however, typically are not specific for targeted proteins but 
interact with other proteins as well. Thus, a relatively large dose of drug 
must be used to effectively modulate a targeted protein. 
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Typical daily doses of drugs are from 10"5 - 10" 1 millimoles per 
kilogram of body weight or 10~3 - 10 millimoles for a 100 kilogram person. 
If this modulation instead could be effected by interaction with and 
inactivation of mRNA, a dramatic reduction in the necessary amount of 
drug could likely be achieved, along with a corresponding reduction in side 
effects. Further reductions could be effected if such interaction could be 
rendered site-specific. Given that a functioning gene continually produces 
mRNA, it would thus be even more advantageous if gene transcription 
could be arrested in its entirety. 

Given these facts, it would be advantageous if gene expression could 
be arrested or downmodulated at the transcription level. 

The ability of chemically synthesizing oligonucleotides and analogs 
thereof having a selected predetermined sequence offers means for 
downmodulating gene expression. Three types of gene expression 
modulation strategies may be considered. 

At the transcription level, antisense or sense oligonucleotides or 
analogs that bind to the genomic DNA by strand displacement or the 
formation of a triple helix, may prevent transcription (64). 

At the transcript level, antisense oligonucleotides or analogs that 
bind target mRNA molecules lead to the enzymatic cleavage of the hybrid 
by intracellular RNase H (65). In this case, by hybridizing to the targeted 
mRNA, the oligonucleotides or oligonucleotide analogs provide a duplex 
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hybrid recognized and destroyed by the RNase H enzyme. Alternatively, 
such hybrid formation may lead to interference with correct splicing (66). 
As a result, in both cases, the number of the target mRNA intact transcripts 
ready for translation is reduced or eliminated. 

At the translation level, antisense oligonucleotides or analogs that 
bind target mRNA molecules prevent, by steric hindrance, binding of 
essential translation factors (ribosomes), to the target mRNA, a 
phenomenon known in the art as hybridization arrest, disabling the 
translation of such mRNAs (67). 

Thus, antisense sequences, which as described hereinabove may 
arrest the expression of any endogenous and/or exogenous gene depending 
on their specific sequence, attracted much attention by scientists and 
pharmacologists who were devoted at developing the antisense approach 
into a new pharmacological tool (68). 

For example, several antisense oligonucleotides have been shown to 
arrest hematopoietic cell proliferation (69), growth (70), entry into the S 
phase of the cell cycle (71), reduced survival (72) and prevent receptor 
mediated responses (73). For use of antisense oligonucleotides as antiviral 
agents the reader is referred to reference 74. 

For efficient in vivo inhibition of gene expression using antisense 
oligonucleotides or analogs, the oligonucleotides or analogs must fulfill the 
following requirements (i) sufficient specificity in binding to the target 



sequence; (ii) solubility in water; (iii) stability against intra- and 
extracellular nucleases; (iv) capability of penetration through the cell 
membrane; and (v) when used to treat an organism, low toxicity. 

Unmodified oligonucleotides are impractical for use as antisense 

5 sequences since they have short in vivo half-lives, during which they are 
degraded rapidly by nucleases. Furthermore, they are difficult to prepare in 
more than milligram quantities. In addition, such oligonucleotides are poor 
cell membrane penetraters (75). 

Thus it is apparent that in order to meet all the above listed 

10 requirements, oligonucleotide analogs need to be devised in a suitable 
manner. Therefore, an extensive search for modified oligonucleotides has 
been initiated. 

For example, problems arising in connection with double-stranded 
DNA (dsDNA) recognition through triple helix formation have been 

15 diminished by a clever "switch back" chemical linking, whereby a sequence 
of polypurine on one strand is recognized, and by "switching back", a 
homopurine sequence on the other strand can be recognized. Also, good 
helix formation has been obtained by using artificial bases, thereby 
improving binding conditions with regard to ionic strength and pH. 

20 In addition, in order to improve half-life as well as membrane 

penetration, a large number of variations in polynucleotide backbones have 
been done, nevertheless with little success. 
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Oligonucleotides can be modified either in the base, the sugar or the 
phosphate moiety. These modifications include, for example, the use of 
methylphosphonates, monothiophosphates, dithiophosphates, 

phosphoramidates, phosphate esters, bridged phosphorothioates, bridged 

5 phosphoramidates, bridged methylenephosphonates, dephospho 

internucleotide analogs with siloxane bridges, carbonate bridges, 
carboxymethyl ester bridges, carbonate bridges, carboxymethyl ester 
bridges, acetamide bridges, carbamate bridges, thioether bridges, sulfoxy 
bridges, sulfono bridges, various "plastic" DNAs, o-anomeric bridges and 

10 borane derivatives. For further details the reader is referred to reference 76. 

International patent application WO 89/12060 discloses various 
building blocks for synthesizing oligonucleotide analogs, as well as 
oligonucleotide analogs formed by joining such building blocks in a defined 
sequence. The building blocks may be either "rigid" (i.e., containing a ring 

15 structure) or "flexible" (i.e., lacking a ring structure). In both cases, the 
building blocks contain a hydroxy group and a mercapto group, through 
which the building blocks are said to join to form oligonucleotide analogs. 
The linking moiety in the oligonucleotide analogs is selected from the group 
consisting of sulfide (-S-), sulfoxide (-SO-), and sulfone (-SO2-). However, 

20 the application provides no data supporting the specific binding of an 
oligonucleotide analog to a target oligonucleotide. 
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International patent application WO 92/20702 describe an acyclic 
oligonucleotide which includes a peptide backbone on which any selected 
chemical nucleobases or analogs are stringed and serve as coding characters 
as they do in natural DNA or RNA. These new compounds, known as 

5 peptide nucleic acids (PNAs), are not only more stable in cells than their 
natural counterparts, but also bind natural DNA and RNA 50 to 100 times 
more tightly than the natural nucleic aicids cling to each other (77). PNA 
oligomers can be synthesized from the four protected monomers containing 
thymine, cytosine, adenine and guanine by Merrifield solid-phase peptide 

10 synthesis. In order to increase solubility in water and to prevent 
aggregation, a lysine amide group is placed at the C-terminal. 

Thus, antisense technology requires pairing of messenger RNA with 
an oligonucleotide to form a double helix that inhibits translation. The 
concept of antisense-mediated gene therapy was already introduced in 1978 

15 for cancer therapy. This approach was based on certain genes that are 
crucial in cell division and growth of cancer cells. Synthetic fragments of 
genetic substance DNA can achieve this goal. Such molecules bind to the 
targeted gene molecules in RNA of tumor cells, thereby inhibiting the 
translation of the genes and resulting in dysfunctional growth of these cells. 

20 Other mechanisms has also been proposed. These strategies have been 
used, with some success in treatment of cancers, as well as other illnesses, 
including viral and other infectious diseases. Antisense oligonucleotides 
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are typically synthesized in lengths of 13-30 nucleotides. The life span of 
oligonucleotide molecules in blood is rather short. Thus, they have to be 
chemically modified to prevent destruction by ubiquitous nucleases present ' 
in the body. Phosphorothioates are very widely used modification in 
antisense oligonucleotide ongoing clinical trials (57). A new generation of 
antisense molecules consist of hybrid antisense oligonucleotide with a 
central portion of synthetic DNA while four bases on each end have been 
modified with 2'O-methyl ribose to resemble RNA. In preclinical studies in 
laboratory animals, such compounds have demonstrated greater stability to 
metabolism in body tissues and an improved safety profile when compared 
with the first-generation unmodified phosphorothioate (Hybridon Inc. 
news). Dosens of other nucleotide analogs have also been tested in 
antisense technology. 

RNA oligonucleotides may also be used for antisense inhibition as 
they form a stable RNA-RNA duplex with the target, suggesting efficient 
inhibition. However, due to their low stability RNA oligonucleotides are 
typically expressed inside the cells using vectors designed for this purpose. 
This approach is favored when attempting to target a mRNA that encodes 
an abundant and long-lived protein (57). 

Recent scientific publications have validated the efficacy of antisense 
compounds in animal models of hepatitis, cancers, coronary artery 
restenosis and other diseases. The first antisense drug was recently 



approved by the FDA. This drug Fomivirsen, developed by Isis, is indicated 
for local treatment of cytomegalovirus in patients with AIDS who are 
intolerant of or have a contraindication to other treatments for CMV retinitis 
or who were insufficiently responsive to previous treatments for CMV 

5 retinitis (Pharmacotherapy News Network). 

Several antisense compounds are now in clinical trials in the United 
States. These include locally administered antivirals, systemic cancer 
therapeutics. Antisense therapeutics has the potential to treat many 
life-threatening diseases with a number of advantages over traditional drugs. 

10 Traditional drugs intervene after a disease-causing protein is formed. 
Antisense therapeutics, however, block mRNA transcription/translation and 
intervene before a protein is formed, and since antisense therapeutics target 
only one specific mRNA, they should be more effective with fewer side 
effects than current protein-inhibiting therapy. 

15 A second option for disrupting gene expression at the level of 

transcription uses synthetic oligonucleotides capable of hybridizing with 
double stranded DNA. A triple helix is formed. Such oligonucleotides may 
prevent binding of transcription factors to the gene's promoter and therefore 
inhibit transcription. Alternatively, they may prevent duplex unwinding 

20 and, therefore, transcription of genes within the triple helical structure. 

Another approach is the use of specific nucleic acid sequences to act 
as decoys for transcription factors. Since transcription factors bind specific 
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DNA sequences it is possible to synthesize oligonucleotides that will 
effectively compete with the native DNA sequences for available 
transcription factors in vivo. This approach requires the identification of 
gene specific transcription factor (57). 

Indirect inhibition of gene expression was demonstrated for matrix 
metalloproteinase genes (MMP-1, -3, and -9), which are associated with 
invasive potential of human cancer cells. E1AF is a transcription activator 
of MMP genes. Expression of E1AF antisense RNA in HSC3AS cells 
showed decrease in mRNA and protein levels of MMP-1, -3, and -9. 
Moreover, HSC3AS showed lower invasive potential in vitro and in vivo. 
These results imply that transfection of antisense inhibits tumor invasion by 
down-regulating MMP genes (58). 

Ribozytnes: 

Ribozymes are being increasingly used for the sequence-specific 
inhibition of gene expression by the cleavage of mRNAs encoding proteins 
of interest. The possibility of designing ribozymes to cleave any specific 
target RNA has rendered them valuable tools in both basic research and 
therapeutic applications. In the therapeutics area, ribozymes have been 
exploited to target viral RNAs in infectious diseases, dominant oncogenes 
in cancers and specific somatic mutations in genetic disorders. Most 
notably, several ribozyme gene therapy protocols for HIV patients are 
already in Phase 1 trials (62). More recently, ribozymes have been used for 



transgenic animal research, gene target validation and pathway elucidation. 
Several ribozymes are in various stages of clinical trials. ANGIOZYME 
was the first chemically synthesized ribozyme to be studied in human 
clinical trials. ANGIOZYME specifically inhibits formation of the VEGF-r 

5 (Vascular Endothelial Growth Factor receptor), a key component in the 
angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other 
firms have demonstrated the importance of anti-angiogenesis therapeutics in 
animal models. HEPTAZYME, a ribozyme designed to selectively destroy 
Hepatitis C Virus (HCV) RNA, was found effective in decreasing Hepatitis 

10 C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, 
Incorporated - WEB home page). 

Gene disruption in animal models: 

The emergence of gene inactivation by homologous recombination 
methodology in embryonic stem cells has revolutionized the field of mouse 

15 genetics. The availability of a rapidly growing number of mouse null 
mutants has represented an invaluable source of knowledge on mammalian 
development, cellular biology and physiology, and has provided many 
models for human inherited diseases. Animal models are required for an 
effective drug delivery development program and evaluation of gene 

20 therapy approach. The improvement of the original knockout strategy, as 
well as exploitation of exogenous enzymatic systems that are active in the 
recombination process, has been considerably extended the range of genetic 
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manipulations that can be produced. Additional methods have been 
developed to provide versatile research tools: Double replacement method, 
sequential gene targeting, conditional cell type specific gene targeting, 
single copy integration method, inducible gene targeting, gene disruption by 

5 viral delivery, replacing one gene with another, the so called knock-in 
method and the induction of specific balanced chromosomal translocation. 
It is now possible to introduce a point mutation as a unique change in the 
entire genome, therefore allowing very fine dissection of gene function in 
vivo. Furthermore, the advent of methods allowing conditional gene 

10 targeting opens the way for analysis of consequence of a particular mutation 
in a defined organ and at a specific time during the life of the experimental 
animal (59). 

DNA vaccination: 

Observations in the early 1990s that plasmid DNA could directly 
15 transfect animal cells in vivo sparked exploration of the use of DNA 
plasmids to induce immune response by direct injection into animal of DNA 
encoding antigenic protein. When a DNA vaccine plasmid enters the 
eukaryotic cell, the protein it encodes is transcribed and translated within 
the cell. In the case of pathogens, these proteins are presented to the 
20 immune system in their native form, mimicking the presentation of antigens 
during a natural infection. DNA vaccination is particularly useful for the 
induction of T cell activation. It was applied for viral and bacterial 
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infectious diseases, as well as for allergy and for cancer. The central 
hypothesis behind active specific immunotherapy for cancer is that tumor 
cells express unique antigens that should stimulate the immune system. The 
first DNA vaccine against tumor was carcino-embrionic antigen (CEA). 
DNA vaccinated animals expressed immunoprotection and immunotherapy 
of human CEA-expressing syngeneic mouse colon and breast carcinoma 
(61). In a mouse model of neuroblastoma, DNA immunization with HuD 
resulted in tumor growth inhibition with no neurological disease (60). 
Immunity to the brown locus protein, gp^5 tyrosinase-related protein- 1, 
associated with melanoma, was investigated in a syngeneic mouse model. 
Priming with human gp75 DNA broke tolerance to mouse gp75. Immunity 
against mouse gp75 provided significant tumor protection (60). 
Glycosyl hydrolases: 

Glycosyl hydrolases are a widespread group of enzymes that 
hydrolyze the o-glycosidic bond between two or more carbohydrates or 
between a carbohydrate and a noncarbohydrate moiety. The enzymatic 
hydrolysis of glycosidic bond occurs by using major one or two mechanisms 
leading to overall retention or inversion of the anomeric configuration. In 
both mechanisms catalysis involves two residues: a proton donor and a 
nucleophile. Glycosyl hydrolyses have been classified into 58 families 
based on amino acid similarities. The glycosyl hydrolyses from families 1, 
2, 5, 10, 17, 30, 35, 39 and 42 act on a large variety of substrates, however, 
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they all hydrolyze the glycosidic bond in a general acid catalysis 
mechanism, with retention of the anomeric configuration. The mechanism 
involves two glutamic acid residues, which are the proton donors and the 
nucleophile, with an aspargine always preceding the proton donor. 
Analyses of a set of known 3D structures from this group revealed that their 
catalytic domains, despite the low level of sequence identity, adopt a similar 
(cc/p) 8 fold with the proton donor and the nucleophile located at the 
C-terminal ends of strands P4 and (37, respectively. Mutations in the 
functional conserved amino acids of lysosomal glycosyl hydrolases were 
identified in lysosomal storage diseases. 

Lysosomal glycosyl hydrolases including p-glucuronidase, P- 
manosidase, p-glucocerebrosidase, p-galactosidase and a-L iduronidase, are 
all exo-glycosyl hydrolases, belong to the GH-A clan and share a similar 
catalytic site. However, many endo-glucanases from various organisms, 
such as bacterial and fungal xylenases and cellulases share this catalytic 
domain. 

Genomic sequence of hpa gene and its implications: 
It is well established that heparanase activity is correlated with 
cancer metastasis. This correlation was demonstrated at the level of 
enzymatic activity as well as the levels of protein and hpa cDNA expression 
in highly metastatic cancer cells as compared with non-metastatic cells. As 
such, inhibition of heparanase activity is desirable, and has been attempted 
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by several means. The genomic region, encoding the hpa gene and the 
surrounding, provides a new powerful tool for regulation of heparanase 
activity at the level of gene expression! Regulatory sequences may reside in 
noncoding regions both upstream and downstream the transcribed region as 
well as in intron sequences. A DNA sequence upstream of the transcription 
start site contains the promoter region and potential regulatory elements. 
Regulatory factors, which interact with the promoter region may be 
identified and be used as potential drugs for inhibition of cancer, metastasis 
and inflammation. The promoter region can be used to screen for inhibitors 
of heparanase gene expression. Furthermore, the hpa promoter can be used 
to direct cell specific, particularly cancer cell specific, expression of foreign 
genes, such as cytotoxic or apoptotic genes, in order to specifically destroy 
cancer cells. 

Cancer and yet unknown related genetic disorders may involve 
rearrangements and mutations in the heparanase gene, either in coding or 
non-coding regions. Such mutations may affect expression level or 
enzymatic activity. The genomic sequence of hpa enables the amplification 
of specific genomic DNA fragments, identification and diagnosis of 
mutations. 

There is thus a widely recognized need for, and it would be highly 
advantageous to have genomic, cDNA and composite polynucleotides 
encoding a polypeptide having heparanase activity, vectors including same, 



genetically modified cells expressing heparanase and a recombinant protein 
having heparanase activity, as well as antisense oligonucleotides, constructs 
and ribozymes which can be used for down regulation heparanase activity. 

5 SUMMARY OF THE INVENTION 

Cloning of the human hpa gene which encodes heparanase, and 
expression of recombinant heparanase by transfected host cells is reported 
herein, as well as downregulation of heparanase activity by antisense 
technology. 

10 A purified preparation of heparanase isolated from human hepatoma 

cells was subjected to tryptic digestion and microsequencing. The 
YGPDVGQPR (SEQ ID NO:8) sequence revealed was used to screen EST 
databases for homology to the corresponding back translated DNA 
sequence. Two closely related EST sequences were identified and were 

is thereafter found to be identical. Both clones contained an insert of 1020 bp 
which included an open reading frame of 973 bp followed by a 27 bp of 3' 
untranslated region and a Poly A tail. Translation start site was not 
identified. 

Cloning of the missing 5' end of hpa was performed by PCR 
20 amplification of DNA from placenta Marathon RACE cDNA composite 
using primers selected according to the EST clones sequence and the linkers 
of the composite. A 900 bp PCR fragment, partially overlapping with the 



identified 3' encoding EST clones was obtained. The joined cDNA 
fragment {hpa\ 1721 bp long (SEQ ID NO:9), contained an open reading 
frame which encodes a polypeptide of 543 amino acids (SEQ ID NO: 10) 
with a calculated molecular weight of 61,192 daltons. 

5 Cloning an extended 5 ! sequence was enabled from the human 

SK-hepl cell line by PCR amplification using the Marathon RACE. The 5' 
extended sequence of the SK-hepl hpa cDNA was assembled with the 
sequence of the hpa cDNA isolated from human placenta (SEQ ID NO:9). 
The assembled sequence contained an open reading frame, SEQ ID NOs: 13 

10 and 15, which encodes, as shown in SEQ ID NOs: 14 and 15, a polypeptide 
of 592 amino acids with a calculated molecular weight of 66,407 daltons. 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate in an in vitro assay was examined by expressing the entire 
open reading frame of hpa in insect cells, using the Baculovirus expression 

15 system. Extracts and conditioned media of cells infected with virus 
containing the hpa gene, demonstrated a high level of heparan sulfate 
degradation activity both towards soluble ECM-derived HSPG and intact 
ECM. This degradation activity was inhibited by heparin, which is another 
substrate of heparanase. Cells infected with a similar construct containing 

20 no hpa gene had no such activity, nor did non-infected cells. The ability of 
heparanase expressed from the extended 5 ? clone towards heparin was 
demonstrated in a mammalian expression system. 
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The expression pattern of hpa RNA in various tissues and cell lines 
was investigated using RT-PCR. It was found to be expressed only in 
tissues and cells previously known to have heparanase activity. 

A panel of monochromosomal human/CHO and human/mouse 
somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can be 
used to identify a chromosome region harboring a human heparanase gene 
in a chromosome spread. 

A human genomic library was screened and the human locus 
harboring the heparanase gene isolated, sequenced and characterized. 
Alternatively spliced heparanase mRNAs were identified and characterized. 
The human heparanase promoter has been isolated, identified and positively 
tested for activity. The mouse heparanase promoter has been isolated and 
identified as well. Antisense heparanase constructs were prepared and their 
influence on cells in vitro tested. A predicted heparanase active site was 
identified. And finally, the presence of sequences hybridizing with human 
heparanase sequences was demonstrated for a variety of mammalians and 
for an avian. 

According to one aspect of the present invention there is provided an 
isolated nucleic acid comprising a genomic, complementary or composite 
polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 
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According to further features in preferred embodiments of the 
invention described below, the polynucleotide or a portion thereof is 
hybridizable with SEQ ID NOs: 9, 13, 42, 43 or a portion thereof at 68 °C in 
6 x SSC, 1 % SDS, 5 x Denharts, 10 % dextran sulfate, 100 |ag/ml salmon 
sperm DNA, and 32 p labeled probe and wash at 68 °C with 3 x SSC and 0.1 
%SDS. 

According to still further features in the described preferred 
embodiments the polynucleotide or a portion thereof is at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software package 
developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 

According to still further features in the described preferred 
embodiments the polypeptide is as set forth in SEQ ID NOs: 10, 14, 44 or 
portions thereof 

According to still further features in the described preferred 
embodiments the polypeptide is at least 60 % homologous to SEQ ID 
NOs: 10, 14, 44 or portions thereof as determined with the Smith-Waterman 
algorithm, using the Bioaccelerator platform developed by Compugene 
(gapop: 10.0, gapext: 0.5. matrix: blosum62). 
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According to additional aspects of the present invention there are 
provided a nucleic acid construct (vector) comprising the isolated nucleic 
acid described herein and a host cell comprising the construct. 

According to a further aspect of the present invention there is 
provided an antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of at least 10 bases being hybridizable in vivo, under 
physiological conditions, with a portion of a polynucleotide strand encoding 
a polypeptide having heparanase catalytic activity. 

According to an additional aspect of the present invention there is 
provided a method of in vivo downregulating heparanase activity 
comprising the step of in vivo administering the antisense oligonucleotide 
herein described. 

According to yet an additional aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense 
oligonucleotide herein described and a pharmaceutically acceptable carrier. 

According to still an additional aspect of the present invention there 
is provided a ribozyme comprising the antisense oligonucleotide described 
herein and a ribozyme sequence. 

According to a further aspect of the present invention there is 
provided an antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo. 



under physiological conditions, with a portion of a polynucleotide strand 

encoding a polypeptide having heparanase catalytic activity. 

According to further features in preferred embodiments of the 

invention described below, the polynucleotide strand encoding the 
5 polypeptide having heparanase catalytic activity is as set forth in SEQ ID 

NOs: 9, 13,42 or 43. 

According to still further features in the described preferred 

embodiments the polypeptide having heparanase catalytic activity is as set 

forth in SEQ ID NOs: 10, 14 or 44. 
10 According to still a further aspect of the present invention there is 

provided a method of in vivo downregulating heparanase activity 

comprising the step of in vivo administering the antisense nucleic acid 

construct herein described. 

According to yet a further aspect of the present invention there is 
15 provided a pharmaceutical composition comprising the antisense nucleic 

acid construct herein described and a pharmaceutically acceptable carrier. 

According to a further aspect of the present invention there is 

provided a nucleic acid construct comprising a polynucleotide sequence 

functioning as a promoter, the polynucleotide sequence is derived from SEQ 
20 ID NO:42 and includes at least nucleotides 2535-2635 thereof or from SEQ 

ID NO:43 and includes at least nucleotides 320-420. 



According to a further aspect of the present invention there is 
provided a method of expressing a polynucleotide sequence comprising the 
step of ligating the polynucleotide sequence into the nucleic acid construct 
described above, downstream of the polynucleotide sequence derived from 
5 SEQIDNOs:42 or 43. 

According to a further aspect of the present invention there is 
provided a recombinant protein comprising a polypeptide having heparanase 
catalytic activity. 

According to further features in preferred embodiments of the 
10 invention described below, the polypeptide includes at least a portion of 
SEQ ID NOs: 10, 14 or 44. 

According to still further features in the described preferred 
embodiments the protein is encoded by a polynucleotide hybridizable with 
SEQ ID NOs: 9, 13, 42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % 
15 SDS, 5 x Denharts, 10 % dextran sulfate. 100 ]ig/ml salmon sperm DNA, 
and 32 p labeled probe and wash at 68 °C with 3 x SSC and 0.1 % SDS. 

According to still further features in the described preferred 
embodiments the protein is encoded by a polynucleotide at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
20 using the Bestfit procedure of the DNA sequence analysis software package 
developed "by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 
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According to a further aspect of the present invention there is 
provided a pharmaceutical composition comprising, as an active ingredient, 
the recombinant protein herein described. 

According to a further aspect of the present invention there is 
provided a method of identifying a chromosome region harboring a 
heparanase gene in a chromosome spread comprising the steps of (a) 
hybridizing the chromosome spread with a tagged polynucleotide probe 
encoding heparanase; (b) washing the chromosome spread, thereby 
removing excess of non-hybridized probe; and (c) searching for signals 
associated with the hybridized tagged polynucleotide probe, wherein 
detected signals being indicative of a chromosome region harboring a 
heparanase gene. 

According to a further aspect of the present invention there is 
provided a method of in vivo eliciting anti-heparanase antibodies 
comprising the steps of administering a nucleic acid construct including a 
polynucleotide segment corresponding to at least a portion of SEQ ID 
NOs:9, 13 or 43 and a promoter for directing the expression of said 
polynucleotide segment in vivo. Accordingly, there is provided also a DNA 
vaccine for in vivo eliciting anti-heparanase antibodies comprising a nucleic 
acid construct including a polynucleotide segment corresponding to at least 
a portion of SEQ ID NOs:9, 13 or 43 and a promoter for directing the 
expression of said polynucleotide segment in vivo. 
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The present invention can be used to develop new drugs to inhibit 
tumor cell metastasis, inflammation and autoimmunity. The identification 
of the hpa gene encoding for heparanase enzyme enables the production of 
a recombinant enzyme in heterologous expression systems. Additional 
features, advantages, uses and applications of the present invention in 
biological science and in diagnostic and therapeutic medicine are described 
hereinafter. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention herein described, by way of example only, with 
reference to the accompanying drawings, wherein: 

FIG. 1 presents nucleotide sequence and deduced amino acid 
sequence of hpa cDNA. A single nucleotide difference at position 799 (A 
to T) between the EST (Expressed Sequence Tag) and the PCR amplified 
cDNA (reverse transcribed RNA) and the resulting amino acid substitution 
(Tyr to Phe) are indicated above and below the substituted unit, 
respectively. Cysteine residues and the poly adenylation consensus 
sequence are underlined. The asterisk denotes the stop codon TGA. 

FIG. 2 demonstrates degradation of soluble sulfate labeled HSPG 
substrate by lysates of High Five cells infected with p¥hpa2 virus. Lysates 
of High Five cells that were infected with p¥hpa2 virus (•) or control pF2 
virus (□) were incubated (18 h, 37 °C) with sulfate labeled ECM-derived 



soluble HSPG (peak I). The incubation medium was then subjected to gel 
filtration on Sepharose 6B. Low molecular weight HS degradation 
fragments (peak II) were produced only during incubation with the pVhpal 
infected cells, but there was no degradation of the HSPG substrate (V) by 

5 lysates of pF2 infected cells. 

FIGs. 3a-b demonstrate degradation of soluble sulfate labeled HSPG 
substrate by the culture medium of pYhpal and p¥hpa4 infected cells. 
Culture media of High Five cells infected with pYhpdl (3a) or pYhpaA (3b) 
viruses (•), or with control viruses (□) were incubated (18 h, 37 °C) with 

10 sulfate labeled ECM-derived soluble HSPG (peak I, f). The incubation 
media were then subjected to gel filtration on Sepharose 6B. Low 
molecular weight HS degradation fragments (peak II) were produced only 
during incubation with the hpa gene containing viruses. There was no 
degradation of the HSPG substrate by the culture medium of cells infected 

15 with control viruses. 

FIG. 4 presents size fractionation of heparanase activity expressed by 
pYhpdl infected cells. Culture medium of pYhpdl infected High Five cells 
was applied onto a 50 kDa cut-off membrane. Heparanase activity 
(conversion of the peak I substrate, (o) into peak II HS degradation 

20 fragments) was found in the high (> 50 kDa) (•), but not low (< 50 kDa) (o) 
molecular weight compartment. 



FIGs. 5a-b demonstrate the effect of heparin on heparanase activity 
expressed by pFhpal and pFhpaA infected High Five cells. Culture media 
of p¥hpa2 (5a) and pFhpa4 (5b) infected High Five cells were incubated 
(18 h, 37 °C) with sulfate labeled ECM-derived soluble HSPG (peak I, ♦) in 

5 the absence (o) or presence (t) of 10 |ig/ml heparin. Production of low 
molecular weight HS degradation fragments was completely abolished in 
the presence of heparin, a potent inhibitor of heparanase activity (6, 7). 

FIGs. 6a-b demonstrate degradation of sulfate labeled intact ECM by 
virus infected High Five and Sf21 cells. High Five (6a) and Sf21 (6b) cells 

10 were plated on sulfate labeled ECM and infected (48 h, 28 °C) with pFhpa4 
(o) or control pFl (□) viruses. Control non-infected Sf21 cells (r) were 
plated on the labeled ECM as well. The pH of the cultured medium was 
adjusted to 6.0 - 6.2 followed by 24 h incubation at 37 °C. Sulfate labeled 
material released into the incubation medium was analyzed by gel filtration 

15 on Sepharose 6B. HS degradation fragments were produced only by cells 
infected with the hpa containing virus. 

FIG. 7a-b demonstrate degradation of sulfate labeled intact ECM by 
virus infected cells. High Five (7a) and Sf21 (7b) cells were plated on 
sulfate labeled ECM and infected (48 h, 28 °C) with pF hpa4 (o) or control 

20 pFl (□) viruses. Control non-infected Sf21 cells (r) were plate on labeled 
ECM as well. The pH of the cultured medium was adjusted to 6.0 - 6.2, 
followed by 48 h incubation at 28 °C. Sulfate labeled degradation 



fragments released into the incubation medium was analyzed by gel 
filtration on Sepharose 6B. HS degradation fragments were produced only 
by cells infected with the hpa containing virus. 

FIGs. 8a-b demonstrate degradation of sulfate labeled intact ECM by 
5 the culture medium of pF hpa4 infected cells. Culture media of High Five 
(8a) and Sf21 (8b) cells that were infected with pF hpa4 (•) or control pFl ( 
□) viruses were incubated (48 h, 37 °C ? pH 6.0) with intact sulfate labeled 
ECM. The ECM was also incubated with the culture medium of control 
non-infected Sf21 cells (r). Sulfate labeled material released into the 
10 reaction mixture was subjected to gel filtration analysis. Heparanase 
activity was detected only in the culture medium of p¥hpa4 infected cells. 

FIGs. 9a-b demonstrate the effect of heparin on heparanase activity 
in the culture medium of p¥hpa4 infected cells. Sulfate labeled ECM was 
incubated (24 h 5 37 °C. pH 6.0) with culture medium of pFhpa4 infected 
15 High Five (9a) and Sf21 (9b) cells in the absence (•) or presence (V) of 10 
pg/ml heparin. Sulfate labeled material released into the incubation 
medium was subjected to gel filtration on Sepharose 6B. Heparanase 
activity (production of peak II HS degradation fragments) was completely 
inhibited in the presence of heparin. 
20 FIGs. lOa-b demonstrate purification of recombinant heparanase on 

heparin-Sepharose. Culture medium of Sf21 cells infected with p¥hpa4 
virus was subjected to heparin-Sepharose chromatography. Elution of 
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fractions was performed with 0.35 - 2 M NaCl gradient («►)♦ Heparanase 
activity in the eluted fractions is demonstrated in Figure 10a (o). Fractions 
15-28 were subjected to 15 % SDS-polyacrylamide gel electrophoresis 
followed by silver nitrate staining. A correlation is demonstrated between a 
major protein band (MW ~ 63,000) in fractions 19-24 and heparanase 
activity. 

FIGs. 1 la-b demonstrate purification of recombinant heparanase on 
a Superdex 75 gel Filtration column. Active fractions eluted from 
heparin-Sepharose (Figure 10a) were pooled, concentrated and applied onto 
Superdex 75 FPLC column. Fractions were collected and aliquots of each 
fraction were tested for heparanase activity (c, Figure 1 la) and analyzed by 
SDS-polyacrylamide gel electrophoresis followed by silver nitrate staining 
(Figure lib). A correlation is seen between the appearance of a major 
protein band (MW ~ 63,000) in fractions 4-7 and heparanase activity. 

FIGs. 12a-e demonstrate expression of the hpa gene by RT-PCR with 
total RNA from human embryonal tissues (12a), human extra-embryonal 
tissues (12b) and cell lines from different origins (12c-e). RT-PCR products 
using hpa specific primers (I), primers for GAPDH housekeeping gene (II), 
and control reactions without reverse transcriptase demonstrating absence of 
genomic DNA or other examination in RNA samples (III). M- DNA 
molecular weight marker VI (Boehringer Mannheim). For 12a: lane 1 - 
neutrophil cells (adult), lane 2 - muscle, lane 3 - thymus, lane 4 - heart, lane 



5 - adrenal. For 12b: lane 1 - kidney, lane 2 - placenta (8 weeks), lane 3 - 
placenta (11 weeks), lanes 4-7 - mole (complete hydatidiform mole), lane 8 
- cytotrophoblast cells (freshly isolated), lane*9 - cytotrophoblast cells (1.5 h 
in vitro), lane 10 - cytotrophoblast cells (6 h in vitro), lane 11 - 

5 cytotrophoblast cells (18 h in vitro), lane 12 - cytotrophoblast cells (48 h in 
vitro). For 12c: lane 1 - JAR bladder cell line, lane 2 - NCITT testicular 
tumor cell line, lane 3 - SW-480 human hepatoma cell line, lane 4 - HTR 
(cytotrophoblasts transformed by SV40), lane 5 - HPTLP-I hepatocellular 
carcinoma cell line, lane 6 - EJ-28 bladder carcinoma cell line. For 12d: 

10 lane 1 - SK-hep-1 human hepatoma cell line, lane 2 - DAMI human 
megakaryocyte cell line, lane 3 - DAMI cell line + PMA, lane 4 - CHRF 
cell line + PMA, lane 5 - CHRF cell line. For 12e: lane 1 - ABAE bovine 
aortic endothelial cells, lane 2 - 1063 human ovarian cell line, lane 3 - 
human breast carcinoma MDA435 cell line, lane 4 - human breast 

15 carcinoma MDA231 cell line. 

FIG. 13 presents a comparison between nucleotide sequences of the 
human hpa and a mouse EST cDNA fragment (SEQ ID NO: 12) which is 80 
% homologous to the 3' end (starting at nucleotide 1066 of SEQ ID NO:9) 
of the human hpa. The aligned termination codons are underlined. 

20 FIG. 14 demonstrates the chromosomal localization of the hpa gene. 

PCR products of DNA derived from somatic cell hybrids and of genomic 
DNA of hamster, mouse and human of were separated on 0.7 % agarose gel 
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following amplification with hpa specific primers. Lane 1 - Lambda DNA 
digested with BstEll, lane 2 - no DNA control, lanes 3 - 29, PCR 
amplification products. Lanes 3-5 - human, mouse and hamster genomic 
DNA, respectively. Lanes 6-29, human monochromosomal somatic cell 
hybrids representing chromosomes 1-22 and X and Y, respectively. Lane 
30 - Lambda DNA digested with BstEW. An amplification product of 
approximately 2.8 Kb is observed only in lanes 5 and 9, representing human 
genomic DNA and DNA derived from cell hybrid carrying human 
chromosome 4, respectively. These results demonstrate that the hpa gene is 
localized in human chromosome 4. 

FIG. 15 demonstrates the genomic exon-intron structure of the 
human hpa locus (top) and the relative positions of the lambda clones used 
as sequencing templates to sequence the locus (below). The vertical 
rectangles represent exons (E) and the horizontal lines therebetween 
represent introns (I), upstream (U) and downstream (D) regions. 
Continuous lines represent DNA fragments, which were used for sequence 
analysis. The discontinuous line in lambda 6 represent a region, which 
overlaps with lambda 8 and hence was not analyzed. The plasmid contains 
a PCR product, which bridges the gap between L3 and L6. 

FIG. 16 presents the nucleotide sequence of the genomic region of 
the hpa gene. Exon sequences appear in upper case and intron sequences in 
lower case. The deduced amino acid sequence of the exons is printed below 
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the nucleotide sequence. Two predicted transcription start sites are shown 
in bold. 

FIG. 17 presents an alignment of the amino acid sequences of human 
heparanase, mouse and partial sequences of rat homologues. The human 
and the mouse sequences were determined by sequence analysis of the 
isolated cDNAs. The rat sequence is derived from two different EST 
clones, which represent two different regions (5' and 3') of the rat hpa 
cDNA. The human sequence and the amino acids in the mouse and rat 
homologues, which are identical to the human sequence, appear in bold. 

FIG. 1 8 presents a heparanase Zoo blot. Ten micrograms of genomic 
DNA from various sources were digested with EcoBJ and separated on 0.7 
% agarose - TBE gel. Following electrophoresis, the was gel treated with 
HC1 and than with NaOH and the DNA fragments were downward 
transferred to a nylon membrane (Hybond N+. Amersham) with 0.4 N 
NaOH. The membrane was hybridized with a 1.6 Kb DNA probe that 
contained the entire hpa cDNA. Lane order: H - Human; M - Mouse; Rt - 
Rat; P - Pig; Cw - Cow; Hr - Horse; S - Sheep; Rb - Rabbit; D - Dog; Ch 
- Chicken; F - Fish. Size markers (Lambda Bstcll) are shown on the left 

FIG. 19 demonstrates the secondary structure prediction for 
heparanase performed using the PHD server - Profile network Prediction 
Heidelberg. H - helix, E - extended (beta strand), The glutamic acid 
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predicted as the proton donor is marked by asterisk and the possible 
nucleophiles are underlined. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is of a polynucleotide or nucleic acid, referred 
to hereinbelow interchangeably as hpa, hpa cDNA or hpa gene or identified 
by its SEQ ID NOs 5 encoding a polypeptide having heparanase activity, 
vectors or nucleic acid constructs including same and which are used for 
over-expression or antisense inhibition of heparanase, genetically modified 
cells expressing same, recombinant protein having heparanase activity, 
antisense oligonucleotides and ribozymes for heparanase modulation, and 
heparanase promoter sequences which can be used to direct the expression 
of desired genes. 

Before explaining at least one embodiment of the invention in detail, 
it is to be understood that the invention is not limited in its application to the 
details of construction and the arrangement of the components set forth in 
the following description or illustrated in the drawings. The invention is 
capable of other embodiments or of being practiced or carried out in various 
ways. Also, it is to be understood that the phraseology and terminology 
employed herein is for the purpose of description and should not be 
regarded as limiting. 



Cloning of the human and mouse hpa genes, cDNAs and genomic 
sequence (for human), encoding heparanase and expressing recombinant 
heparanase by transfected cells is reported herein. These are the first 
mammalian heparanase genes to be cloned. 
5 A purified preparation of heparanase isolated from human hepatoma 

cells was subjected to tryptic digestion and microsequencing. 

The YGPDVGQPR (SEQ ID NO:8) sequence revealed was used to 
screen EST databases for homology to the corresponding back translated 
DNA sequences. Two closely related EST sequences were identified and 
10 were thereafter found to be identical. 

Both clones contained an insert of 1020 bp which includes an open 
reading frame of 973 bp followed by a 3' untranslated region of 27 bp and a 
Poly A tail, whereas a translation start site was not identified. 

Cloning of the missing 5 ? end was performed by PCR amplification 
15 of DNA from placenta Marathon RACE cDNA composite using primers 
selected according to the EST clones sequence and the linkers of the 
composite. 

A 900 bp PCR fragment, partially overlapping with the identified 3' 
encoding EST clones was obtained. The joined cDNA fragment (hpa), 
20 1721 bp long (SEQ ID NO:9). contained an open reading frame which 
encodes, as shown in Figure 1 and SEQ ID NO: 11, a polypeptide of 543 
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amino acids (SEQ ID NO:10) with a calculated molecular weight of 61,192 
daltons. 

A single nucleotide difference at position 799 (A to T) between the 
EST clones and the PCR amplified cDNA was observed. This difference 

5 results in a single amino acid substitution (Tyr to Phe) (Figure 1). 
Furthermore, the published EST sequences contained an unidentified 
nucleotide, which following DNA sequencing of both the EST clones was 
resolved into two nucleotides (G and C at positions 1630 and 1631 in SEQ 
ID NO:9, respectively). 

10 The ability of the hpa gene product to catalyze degradation of 

heparan sulfate in an in vitro assay was examined by expressing the entire 
open reading frame in insect cells, using the Baculovirus expression system. 

Extracts and conditioned media of cells infected with virus 
containing the hpa gene, demonstrated a high level of heparan sulfate 

15 degradation activity both towards soluble ECM-derived HSPG and intact 
ECM, which was inhibited by heparin, while cells infected with a similar 
construct containing no hpa gene had no such activity, nor did non-infected 
cells. 

The expression pattern of hpa RNA in various tissues and cell lines 
20 was investigated using RT-PCR. It was found to be expressed only in 
tissues and cells previously known to have heparanase activity. 
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Cloning an extended 5' sequence was enabled from the human 
SK-hepl cell line by PCR amplification using the Marathon RACE. The 5 f 
extended sequence of the SK-hepl hpa cDNA was assembled with the 
sequence of the hpa cDNA isolated from human placenta (SEQ ID NO:9). 
The assembled sequence contained an open reading frame, SEQ ID NOs: 13 
and 15 ? which encodes, as shown in SEQ ID NOs: 14 and 15, a polypeptide 
of 592 amino acids, with a calculated molecular weight of 66,407 daltons. 
This open reading frame was shown to direct the expression of catalytically 
active heparanase in a mammalian cell expression system. The expressed 
heparanase was detectable by anti heparanase antibodies in Western blot 
analysis. 

A panel of monochromosomal human/CHO and human/mouse 
somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can 
therefore be used to identify a chromosome region harboring a human 
heparanase gene in a chromosome spread. 

The hpa cDNA was then used as a probe to screen a a human 
genomic library. Several phages were positive. These phages were 
analyzed and were found to cover most of the hpa locus, except for a small 
portion which was recovered by bridging PCR. The hpa locus covers about 
50,000 bp. The hpa gene includes 12 exons separated by 1 1 introns. 
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RT-PCR performed on a variety of cells revealed alternatively 
spliced hpa transcripts. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 

5 human EST's were identified, as well as mouse sequences highly 
homologous to human heparanase. The following mouse EST's were 
identified AA177901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 

10 cDNA was cloned, based on the nucleotide sequence of the mouse EST's 
using Marathon cDNA libraries. The mouse and the human hpa genes share 
an average homology of 78 % between the nucleotide sequences and 81 % 
similarity between the deduced amino acid sequences. hpa homologous 
sequences from rat were also uncovered (EST's AI060284 and AI237828). 

15 Homology search of heparanase amino acid sequence against the 

DNA and the protein databases and prediction of its protein secondary 
structure enabled to identify candidate amino acids that participate in the 
heparanase active site. 

Expression of hpa antisense in mammalian cell lines resulted in 

20 about five fold decrease in the number of recoverable cells as compared to 
controls. 



Human Hpa cDNA was shown to hybridize with genomic DNAs of a 
variety of mammalian species and with an avian. 

The human and mouse hpa promoters were identified and the human 
promoter was tested positive in directing the expression of a reporter gene. 
5 Thus, according to the present invention there is provided an isolated 

nucleic acid comprising a genomic, complementary or composite 
polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 

The phrase "composite polynucleotide sequence' 1 refers to a sequence 
10 which includes exonal sequences required to encode the polypeptide having 
heparanase activity, as well as any number of intronal sequences. The 
intronal sequences can be of any source and typically will include conserved 
splicing signal sequences. Such intronal sequences may further include cis 
acting expression regulatory elements. 
15 The term "heparanase catalytic activity" or its equivalent term 

"heparanase activity" both refer to a mammalian endoglycosidase 
hydrolyzing activity which is specific for heparan or heparan sulfate 
proteoglycan substrates, as opposed to the activity of bacterial enzymes 
(heparinase L II and III) which degrade heparin or heparan sulfate by means 
20 of p-elimination (37). 

According to a preferred embodiment of the present invention the 
polynucleotide or a portion thereof is hybridizable with SEQ ID NOs: 9, 13, 



42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x Denharts, 10 
% dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p labeled probe 
and wash at 68 °C with 3, 2, 1, 0.5 or 0.1 x SSC and 0.1 % SDS. 

According to another preferred embodiment of the present invention 

5 the polynucleotide or a portion thereof is at least 60 %, preferably at least 65 
%, more preferably at least 70 %, more preferably at least 75 %, more 
preferably at least 80 %, more preferably at least 85 %, more preferably at 
least 90 %, most preferably, 95-100 % identical with SEQ ID NOs: 9, 13, 
42, 43 or portions thereof as determined using the Bestfit procedure of the 

10 DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 12, gap extension penalty - 4 - which are the default parameters). 

According to another preferred embodiment of the present invention 
the polypeptide encoded by the polynucleotide sequence is as set forth in 

15 SEQ ID NOs: 10, 14, 44 or portions thereof having heparanase catalytic 
activity. Such portions are expected to include amino acids Asp-Glu 
224-225 (SEQ ID NO: 10), which can serve as proton donors and glutamic 
acid 343 or 396 which can serve as a nucleophile. 

According to another preferred embodiment of the present invention 

20 the polypeptide encoded by the polynucleotide sequence is at least 60 %, 
preferably at least 65 %, more preferably at least 70 %, more preferably at 
least 75 %, more preferably at least 80 %, more preferably at least 85 %, 



more preferably at least 90 %, most preferably, 95-100 % homologous (both 
similar and identical acids) to SEQ ID NOsrlO, 14, 44 or portions thereof as 
determined with the Smith- Waterman algorithm, using the Bioaccelerator 
platform developed by Compugene (gapop: 10.0, gapext: 0.5, matrix: 

5 blosum62, see also the description to Figure 17). 

Further according to the present invention there is provided a nucleic 
acid construct comprising the isolated nucleic acid described herein. The 
construct may and preferably further include an origin of replication and 
trans regulatory elements, such as promoter and enhancer sequences. 

10 The construct or vector can be of any type. It may be a phage which 

infects bacteria or a virus which infects eukaryotic cells. It may also be a 
plasmid, phagemid, cosmid, bacmid or an artificial chromosome. 

Further according to the present invention there is provided a host 
cell comprising the nucleic acid construct described herein. The host cell 

15 can be of any type. It may be a prokaryotic cell, an eukaryotic cell, a cell 
line, or a cell as a portion of an organism. The polynucleotide encoding 
heparanase can be permanently or transiently present in the cell. In other 
words, genetically modified cells obtained following stable or transient 
transfection, transformation or transduction are all within the scope of the 

20 present invention. The polynucleotide can be present in the cell in low copy 
(say 1-5 copies) or high copy number (say 5-50 copies or more). It may be 



integrated in one or more chromosomes at any location or be present as an 
extrachromosomal material. 

The present invention is further directed at providing a heparanase 
over-expression system which includes a cell overexpressing heparanase 

5 catalytic activity. The cell may be a genetically modified host cell 
transiently or stably transfected or transformed with any suitable vector 
which includes a polynucleotide sequence encoding a polypeptide having 
heparanase activity and a suitable promoter and enhancer sequences to 
direct over-expression of heparanase. However, the overexpressing cell 

10 may also be a product of an insertion (e.g., via homologous recombination) 
of a promoter and/or enhancer sequence downstream to the endogenous 
heparanase gene of the expressing cell, which will direct over-expression 
from the endogenous gene. 

The term "over-expression" as used herein in the specification and 

15 claims below refers to a level of expression which is higher than a basal 
level of expression typically characterizing a given cell under otherwise 
identical conditions. 

According to another aspect the present invention provides an 
antisense oligonucleotide comprising a polynucleotide or a polynucleotide 

20 analog of at least 10, preferably 11-15, more preferably 16-17, more 
preferably 18, more preferably 19-25, more preferably 26-35, most 
preferably 35-100 bases being hybridizable in vivo, under physiological 



conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. The antisense oligonucleotide can be 
used for downregulating heparanase activity by in vivo administration 
thereof to a patient. As such, the antisense oligonucleotide according to the 
5 present invention can be used to treat types of cancers which are 
characterized by impaired (over) expression of heparanase, and are 
dependent on the expression of heparanase for proliferating or forming 
metastases. 

The antisense oligonucleotide can be DNA or RNA or even include 
10 nucleotide analogs, examples of which are provided in the Background 
section hereinabove. The antisense oligonucleotide according to the present 
invention can be synthetic and is preferably prepared by solid phase 
synthesis. In addition, it can be of any desired length which still provides 
specific base pairing (e.g., 8 or 10, preferably more, nucleotides long) and it 
15 can include mismatches that do not hamper base pairing under physiological 
conditions. 

Further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense oligonucleotide 
herein described and a pharmaceutically acceptable carrier. The carrier can 
20 be, for example, a liposome loadable with the antisense oligonucleotide. 

According to a preferred embodiment of the present invention the 
antisense oligonucleotide further includes a ribozyme sequence. The 
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ribozyme sequence serves to cleave a heparanase RNA molecule to which 
the antisense oligonucleotide binds, to thereby downregulate heparanase 
expression. 

Further according to the present invention there is provided an 
antisense nucleic acid construct comprising a promoter sequence and a 
polynucleotide sequence directing the synthesis of an antisense RNA 
sequence of at least 10 bases being hybridizable in vivo, under physiological 
conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. Like the antisense oligonucleotide, the 
antisense construct can be used for downregulating heparanase activity by in 
vivo administration thereof to a patient. As such, the antisense construct, 
like the antisense oligonucleotide, according to the present invention can be 
used to treat types of cancers which are characterized by impaired (over) 
expression of heparanase, and are dependent on the expression of 
heparanase for proliferating or forming metastases. 

Thus, further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense construct herein 
described and a pharmaceutical^ acceptable carrier. The carrier can be, for 
example, a liposome loadable with the antisense construct. 

Formulations for topical administration may include, but are not 
limited to, lotions, ointments, gels, creams, suppositories, drops, liquids, 
sprays and powders. Conventional pharmaceutical carriers, aqueous. 
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powder or oily bases, thickeners and the like may be necessary or desirable. 
Coated condoms, stents, active pads, and other medical devices may also be 
useful. Compositions for oral administration include powders or granules, 
suspensions or solutions in water or non-aqueous media, sachets, capsules 
or tablets. Thickeners, diluents, flavorings, dispersing aids, emulsifiers or 
binders may be desirable. Formulations for parenteral administration may 
include, but are not limited to, sterile aqueous solutions which may also 
contain buffers, diluents and other suitable additives. 

Dosing is dependent on severity and responsiveness of the condition 
to be treated, but will normally be one or more doses per day, week or 
month with course of treatment lasting from several days to several months 
or until a cure is effected or a diminution of disease state is achieved. 
Persons ordinarily skilled in the art can easily determine optimum dosages, 
dosing methodologies and repetition rates. 

Further according to the present invention there is provided a nucleic 
acid construct comprising a polynucleotide sequence functioning as a 
promoter, the polynucleotide sequence is derived from SEQ ID NO:42 and 
includes at least nucleotides 2135-2635, preferably 2235-2635. more 
preferably 2335-2635, more preferably 2435-2635, most preferably 
2535-2635 thereof, or SEQ ID NO:43 and includes at least nucleotides 
1-420, preferably 120-420, more preferably 220-420, most preferably 
320-420, thereof These nucleotides are shown in the example section that 



follows to direct the synthesis of a reporter gene in transformed cells. Thus, 
further according to the present invention there is provided a method of 
expressing a polynucleotide sequence comprising the step of ligating the 
polynucleotide sequence downstream to either of the promoter sequences 

5 described herein. Heparanase promoters can be isolated from a variety of 
mammalian an other species by cloning genomic regions present 5 1 to the 
coding sequence thereof. This can be readily achievable by one ordinarily 
skilled in the art using the heparanase polynucleotides described herein, 
which are shown in the Examples section that follows to participate in 

10 efficient cross species hybridization. 

Further according to the present invention there is provided a 
recombinant protein comprising a polypeptide having heparanase catalytic 
activity. The protein according to the present invention include 
modifications known as post translational modifications, including, but not 

15 limited to, proteolysis (e.g., removal of a signal peptide and of a pro- or 
preprotein sequence), methionine modification, glycosylation, alkylation 
(e.g., methylation), acetylation, etc. According to preferred embodiments 
the polypeptide includes at least a portion of SEQ ID NOs:10, 14 or 44, the 
portion has heparanase catalytic activity. According to preferred 

20 embodiments of the present invention the protein is encoded by any of the 
above described isolated nucleic acids. Further according to the present 
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invention there is provided a pharmaceutical composition comprising, as an 
active ingredient, the recombinant protein described herein. 

The recombinant protein may be purified by any conventional protein 
purification procedure close to homogeneity and/or be mixed with additives. 
The recombinant protein may be manufactured using any of the genetically 
modified cells described above, which include any of the expression nucleic 
acid constructs described herein. The recombinant protein may be in any 
form. It may be in a crystallized form, a dehydrated powder form or in 
solution. The recombinant protein may be useful in obtaining pure 
heparanase, which in turn may be useful in eliciting anti-heparanase 
antibodies, either poly or monoclonal antibodies, and as a screening active 
ingredient in an anti-heparanase inhibitors or drugs screening assay or 
system. 

Further according to the present invention there is provided a method 
of identifying a chromosome region harboring a human heparanase gene in 
a chromosome spread, the method is executed implementing the following 
method steps, in which in a first step the chromosome spread (either 
interphase or metaphase spread) is hybridized with a tagged polynucleotide 
probe encoding heparanase. The tag is preferably a fluorescent tag. In a 
second step according to the method the chromosome spread is washed, 
thereby excess of non-hybridized probe is removed. Finally, signals 
associated with the hybridized tagged polynucleotide probe are searched for, 



wherein detected signals being indicative of a chromosome region harboring 
the human heparanase gene. One ordinarily skilled in the art would know 
how to use the sequences disclosed herein in suitable labeling reactions and 
how to use the tagged probes to detect, using in situ hybridization, a 

5 chromosome region harboring a human heparanase gene. 

Further according to the present invention there is provided a method 
of in vivo eliciting anti-heparanase antibodies comprising the steps of 
administering a nucleic acid construct including a polynucleotide segment 
corresponding to at least a portion of SEQ ID NOs:9, 13 or 43 and a 

10 promoter for directing the expression of said polynucleotide segment in 
vivo. Accordingly, there is provided also a DNA vaccine for in vivo 
eliciting anti-heparanase antibodies comprising a nucleic acid construct 
including a polynucleotide segment corresponding to at least a portion of 
SEQ ID NOs:9, 13 or 43 and a promoter for directing the expression of said 

15 polynucleotide segment in vivo. The vaccine optionally further includes a 
pharmaceutical^ acceptable carrier, such as a virus, liposome or an antigen 
presenting cell. Alternatively, the vaccine is employed as a naked DNA 
vaccine 

The present invention can be used to develop treatments for various 
20 diseases, to develop diagnostic assays for these diseases and to provide new 
tools for basic research especially in the fields of medicine and biology. 



Specifically, the present invention can be used to develop new drugs 
to inhibit tumor cell metastasis, inflammation and autoimmunity. The 
identification of the hpa gene encoding for the heparanase enzyme enables 
the production of a recombinant enzyme in heterologous expression 
5 systems. 

Furthermore, the present invention can be used to modulate 
bioavailability of heparin-binding growth factors, cellular responses to 
heparin-binding growth factors (e.g., bFGF, VEGF) and cytokines (e.g., 
IL-8), cell interaction with plasma lipoproteins, cellular susceptibility to 

10 viral, protozoa and some bacterial infections, and disintegration of 
neurodegenerative plaques. Recombinant heparanase offers a potential 
treatment for wound healing, angiogenesis, restenosis, atherosclerosis, 
inflammation, neurodegenerative diseases (such as, for example, 
Genstmann-Straussler Syndrome^ Creutzfeldt-Jakob disease, Scrape and 

15 Alzheimer's disease) and certain viral and some bacterial and protozoa 
infections. Recombinant heparanase can be used to neutralize plasma 
heparin, as a potential replacement of protamine. 

As used herein, the term "modulate" includes substantially inhibiting, 
slowing or reversing the progression of a disease, substantially ameliorating 

20 clinical symptoms of a disease or condition, or substantially preventing the 
appearance of clinical symptoms of a disease or condition. A "modulator" 
therefore includes an agent which may modulate a disease or condition. 
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Modulation of viral, protozoa and bacterial infections includes any effect 
which substantially interrupts, prevents or reduces any viral, bacterial or 
protozoa activity and/or stage of the virus, bacterium or protozoon life 
cycle, or which reduces or prevents infection by the virus, bacterium or 
protozoon in a subject, such as a human or lower animal. 

As used herein, the term "wound" includes any injury to any portion 
of the body of a subject including, but not limited to, acute conditions such 
as thermal burns, chemical burns, radiation bums, bums caused by excess 
exposure to ultraviolet radiation such as sunburn, damage to bodily tissues 
such as the perineum as a result of labor and childbirth, including injuries 
sustained during medical procedures such as episiotomies, trauma-induced 
injuries including cuts, those injuries sustained in automobile and other 
mechanical accidents, and those caused by bullets, knives and other 
weapons, and post-surgical injuries, as well as chronic conditions such as 
pressure sores, bedsores, conditions related to diabetes and poor circulation, 
and all types of acne, etc. 

Anti-heparanase antibodies, raised against the recombinant enzyme, 
would be useful for immunodetection and diagnosis of micrometastases, 
autoimmune lesions and renal failure in biopsy specimens, plasma samples, 
and body fluids. Such antibodies may also serve as neutralizing agents for 
heparanase activity. 
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The genomic heparanase sequences described herein can be used to 
construct knock-in and knock-out constructs. Such constructs include a 
fragment of 10-20 Kb of a heparanase locus and a negative and a positive 
selection markers and can be used to provide heparanase knock-in and 
knock-out animal models by methods known to the skilled artisan. Such 
animal models can be used for studying the function of heparanase in 
developmental processes, and in normal as well as pathological processes. 
They can also serve as an experimental model for testing drugs and gene 
therapy protocols. The complementary heparanase sequence (cDNA) can 
be used to derive transgenic animals, overexpressing heparanase for same. 
Alternatively , if cloned in the antisense orientation, the complementary 
heparanase sequence (cDNA) can be used to derive transgenic animals 
under-expressing heparanase for same. 

The heparanase promoter sequences described herein and other cis 
regulatory elements linked to the heparanase locus can be used to regulated 
the expression of genes. For example, these promoters can be used to 
direct the expression of a cytotoxic protein, such as TNF, in tumor cells. It 
will be appreciated that heparanase itself is abnormally expressed under the 
control of its own promoter and other cis acting elements in a variety of 
tumors, and its expression is correlated with metastasis. It is also 
abnormally highly expressed in inflammatory cells. The introns of the 
heparanase gene can be used for the same purpose, as it is known that 
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introns, especially upstream introns include cis acting element which affect 
expression. A heparanase promoter fused to a reporter protein can be used 
to study/monitor its activity. 

The polynucleotide sequences described herein can also be used to 
provide DNA vaccines which will elicit in vivo anti heparanase antibodies. 
Such vaccines can therefore be used to combat inflammatory and cancer. 

Antisense oligonucleotides derived according to the heparanase 
sequences described herein, especially such oligonucleotides supplemented 
with ribozyme activity, can be used to modulate heparanase expression. 
Such oligonucleotides can be from the coding region, from the introns or 
promoter specific. Antisense heparanase nucleic acid constructs can 
similarly function, as well known in the art. 

The heparanase sequences described herein can be used to study the 
catalytic mechanism of heparanase. Carefully selected site directed 
mutagenesis can be employed to provide modified heparanase proteins 
having modified characteristics in terms of, for example, substrate 
specificity, sensitivity to inhibitors, etc. 

While studying heparanase expression in a variety of cell types 
alternatively spliced transcripts were identified. Such transcripts if found 
characteristic of certain pathological conditions can be used as markers for 
such conditions. Such transcripts are expected to direct the synthesis of 
heparanases with altered functions. 
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Additional objects, advantages, and novel features of the present 
invention will become apparent to one ordinarily skilled in the art upon 
examination of the following examples, which are not intended to be 
limiting. Additionally, each of the various embodiments and aspects of the 
5 present invention as delineated hereinabove and as claimed in the claims 
section below finds experimental support in the following examples. 



EXAMPLES 

Generally, the nomenclature used herein and the laboratory 
10 procedures in recombinant DNA technology described below are those well 
known and commonly employed in the art. Standard techniques are used for 
cloning, DNA and RNA isolation, amplification and purification. Generally 
enzymatic reactions involving DNA ligase, DNA polymerase, restriction 
endonucleases and the like are performed according to the manufacturers' 
15 specifications. These techniques and various other techniques are generally 
performed according to Sambrook et ah, Molecular Cloning— A Laboratory 
Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989), 
which is incorporated herein by reference. Other general references are 
provided throughout this document. The procedures therein are believed to 
20 be well known in the art and are provided for the convenience of the reader. 
All the information contained therein is incorporated herein by reference. 
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The following protocols and experimental details are referenced in 
the Examples that follow: 

Purification and characterization of heparanase from a human 

5 hepatoma cell line and human placenta: A human hepatoma cell line 
(Sk-hep-1) was chosen as a source for purification of a human 
tumor-derived heparanase. Purification was essentially as described in U.S. 
Pat. No. 5,362,641 to Fuks, which is incorporated by reference as if fully set 
forth herein. Briefly, 500 liter, 5xlO M cells were grown in suspension and 

10 the heparanase enzyme was purified about 240,000 fold by applying the 
following steps: (i) cation exchange (CM-Sephadex) chromatography 
performed at pH 6.0, 0.3-1.4 M NaCl gradient; (ii) cation exchange 
(CM-Sephadex) chromatography performed at pH 7.4 in the presence of 
0.1% CHAPS, 0.3-1.1 M NaCl gradient; (iii) heparin-Sepharose 

15 chromatography performed at pH 7.4 in the presence of 0.1% CHAPS, 
0.35-1.1 M NaCl gradient; (iv) ConA-Sepharose chromatography performed 
at pH 6.0 in buffer containing 0.1 % CHAPS and 1 M NaCl, elution with 
0.25 M o-methyl mannoside; and (v) HPLC cation exchange (Mono-S) 
chromatography performed at pH 7.4 in the presence of 0.1 % CHAPS, 

20 0.25-1 M NaCl gradient. 

Active fractions were pooled, precipitated with TCA and the 
precipitate subjected to SDS polyacrylamide gel electrophoresis and/or 



tryptic digestion and reverse phase HPLC. Tryptic peptides of the purified 
protein were separated by reverse phase HPLC (C8 column) and 
homogeneous peaks were subjected to amino acid sequence analysis. . 

The purified enzyme was applied to reverse phase HPLC and 

5 subjected to N-terminal amino acid sequencing using the amino acid 
sequencer (Applied Biosystems). 

Cells: Cultures of bovine corneal endothelial cells (BCECs) were 
established from steer eyes as previously described (19, 38). Stock cultures 
were maintained in DMEM (1 g glucose/liter) supplemented with 10 % 

10 newborn calf serum and 5 % FCS. bFGF (1 ng/ml) was added every other 
day during the phase of active cell growth (13 5 14). 

Preparation of dishes coated with ECM: BCECs (second to fifth 
passage) were plated into 4-well plates at an initial density of 2 x 10 5 
cells/mL and cultured in sulfate-free Fisher medium plus 5 % dextran T-40 

15 for 12 days. Na 2 35 S0 4 (25 nCi/ml) was added on day 1 and 5 after seeding 
and the cultures were incubated with the label without medium change. The 
subendothelial ECM was exposed by dissolving (5 min., room temperature) 
the cell layer with PBS containing 0.5 % Triton X-100 and 20 mM NH 4 OH, 
followed by four washes with PBS. The ECM remained intact, free of 

20 cellular debris and firmly attached to the entire area of the tissue culture 
dish(19 ? 22). 
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To prepare soluble sulfate labeled proteoglycans (peak I material), 
the ECM was digested with trypsin (25 ug/ml, 6 h, 37 °C ), the digest was 
concentrated by reverse dialysis and the concentrated material was applied 
onto a Sepharose 6B gel filtration column. The resulting high molecular 
weight material (Kav< 0.2, peak I) was collected. More than 80 % of the 
labeled material was shown to be composed of heparan sulfate 

proteoglycans (11,39). 

Heparanase activity: Cells (1 x 106/35-mm dish), cell lysates or 
conditioned media were incubated on top of 35S-labeled ECM (18 h, 37 °C) 
in the presence of 20 mM phosphate buffer (pH 6.2). Cell lysates and 
conditioned media were also incubated with sulfate labeled peak I material 
(10-20 pi). The incubation medium was collected, centrifuged (18,000 x g, 
4 °C, 3 min.), and sulfate labeled material analyzed by gel filtration on a 
Sepharose CL-6B column (0.9 x 30 cm). Fractions (0.2 ml) were eluted 
with PBS at a flow rate of 5 ml/h and counted for radioactivity using 
Bio-fluor scintillation fluid. The excluded volume (V 0 ) was marked by 
blue dextran and the total included volume (V t ) by phenol red. The latter 
was shown to comigrate with free sulfate (7, 1 1, 23). Degradation fragments 
of HS side chains were eluted from Sepharose 6B at 0.5 < Kav < 0.8 (peak 
II) (7, 11, 23). A nearly intact HSPG released from ECM by trypsin - and, 
to a lower extent, during incubation with PBS alone - was eluted next to V Q 
(Kav < 0.2, peak I). Recoveries of labeled material applied on the columns 
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ranged from 85 to 95 % in different experiments (11). Each experiment was 
performed at least three times and the variation of elution positions (Kav 
values) did not exceed +/- 15 %. 

Cloning of hpa cDNA: cDNA clones 257548 and 260138 were 
obtained from the I.M.A.G.E Consortium (2130 Memorial Parkway SW, 
Hunstville, AL 35801). The cDNAs were originally cloned in EcoRl and 
Notl cloning sites in the plasmid vector pT3T7D-Pac. Although these 
clones are reported to be somewhat different, DNA sequencing 
demonstrated that these clones are identical to one another. Marathon 
RACE (rapid amplification of cDNA ends) human placenta (poly-A) cDNA 
composite was a gift of Prof. Yossi Shiloh of Tel Aviv University. This 
composite is vector free, as it includes reverse transcribed cDNA fragments 
to which double, partially single stranded adapters are attached on both 
sides. The construction of the specific composite employed is described in 
reference 39a. 

Amplification of hp3 PCR fragment was performed according to the 
protocol provided by Clontech laboratories. The template used for 
amplification was a sample taken from the above composite. The primers 
used for amplification were: 

First step: 5'-primer: API: 5'-CCATCCTAATACGACTCACT 
ATAGGGC-3', SEQ ID NO: 1 ; 3'-primer: HPL229: 5 r -GTAGTGATGCCA 
TGTAACTGAATC-3\ SEQ ID NO:2. 
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Second step: nested S'-primer: AP2: S'-ACTCACTATAGGGCTCG 
AGCGGC-3', SEQ ID NO:3; nested 3'- primer: HPL171: 
5'-GCATCTTAGCCGTCTTTCTTCG-3 f , SEQ ID NO:4. The HPL229 and 
HPL171 were selected according to the sequence of the EST clones. They 
5 include nucleotides 933-956 and 876-897 of SEQ ID NO:9 5 respectively. 

PCR program was 94 °C - 4 min. ? followed by 30 cycles of 94 °C - 
40 sec, 62 °C - 1 min., 72 °C - 2.5 min. Amplification was performed with 
Expand High Fidelity (Boehringer Mannheim). The resulting ca. 900 bp 
hp3 PCR product was digested with Bfrl and Pvull, Clone 257548 (phpal) 
10 was digested with EcoRI, followed by end filling and was then further 
digested with Bfr\. Thereafter the Pvull - Bfrl fragment of the hp3 PCR 
product was cloned into the blunt end - Bfrl end of clone phpal which 
resulted in having the entire cDNA cloned in pT3T7-pac vector, designated 
phpal. 

15 RT-PCR: RNA was prepared using TRI-Reagent (Molecular 

research center Inc.) according to the manufacturer instructions. 1.25 |^g 
were taken for reverse transcription reaction using MuMLV Reverse 
transcriptase (Gibco BRL) and Oligo (dT)i5 primer, SEQ ID NO:5, 
(Promega). Amplification of the resultant first strand cDNA was 

20 performed with Tag polymerase (Promega). The following primers were 
used: 

HPU-355: 5'-TTCGATCCCAAGAAGGAATCAAC-3\ SEQ ID NO:6. 
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nucleotides 372-394 in SEQ ID NOs:9 or 1 1. 

HPL-229: 5'-GTAGTGATGCCATGTAACTGAATC-3', SEQ IDNO:7, 
nucleotides 933-956 in SEQ ID NOs:9 or 1 1. 

PCR program: 94 °C - 4 min., followed by 30 cycles of 94 °C - 40 
sec, 62 °C - 1 min., 72 °C - 1 min. 

Alternatively, total RNA was prepared from cell cultures using 
Tri-reagent (Molecular Research Center, Inc.) according to the 
manufacturer recommendation. Poly A+ RNA was isolated from total RNA 
using mRNA separator (Clontech). Reverse transcription was performed 
with total RNA using Superscript II (GibcoBRL). PCR was performed with 
Expand high fidelity (Boehringer Mannheim). Primers used for 
amplification were as follows: 

Hpu-685, 5'-GAGCAGCCAGGTGAGCCCAAGAT-3% SEQ ID NO:24 
Hpu-355, 5 '-TTCG ATCCC AAG AAGG AATC AAC-3 ' , SEQ ID NO:25 
Hpu 565, 5'-AGCTCTGTAGATGTGCTATACAC-3% SEQ ID NO:26 
Hpl 967, 5 ' -TC AG ATGC AAGC AGC AACTTTGGC-3 ' , SEQ ID NO:27 
Hpl 171, 5 ' -GC ATCTTAGCCGTCTTTCTTCG-3 ' , SEQ IDNO:28 
Hpl 229, 5' -GTAGTG ATGCC ATGTAACTG AATC-3 ' , SEQ ID NO:29 

PCR reaction was performed as follows: 94 °C 3 minutes, followed 
by 32 cycles of 94 °C 40 seconds, 64 °C 1 minute, 72 °C 3 minutes, and one 
cycle 72 °C, 7 minutes. 
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Expression of recombinant heparanase in insect cells: Cells, High 
Five and Sf21 insect cell lines were maintained as monolayer cultures in 
SF900II-SFM medium (GibcoBRL). 

Recombinant Baculovirus: Recombinant virus containing the hpa 
gene was constructed using the Bac to Bac system (GibcoBRL). The 
transfer vector pFastBac was digested with Sail and Notl and ligated with a 
1.7 kb fragment of phpa2 digested with Xhol and Notl. The resulting 
plasmid was designated pFast/z/?a2. An identical plasmid designated 
pFast/2/?a4 was prepared as a duplicate and both independently served for 
further experimentations. Recombinant bacmid was generated according to 
the instructions of the manufacturer with pFasthpa2, p¥asthpa4 and with 
pFastBac. The latter served as a negative control. Recombinant bacmid 
DNAs were transfected into Sf21 insect cells. Five days after transfection 
recombinant viruses were harvested and used to infect High Five insect 
cells, 3 x 10 6 cells in T-25 flasks. Cells were harvested 2 - 3 days after 
infection. 4 x 10 6 cells were centrifuged and resuspended in a reaction 
buffer containing 20 mM phosphate citrate buffer, 50 mM NaCl. Cells 
underwent three cycles of freeze and thaw and lysates were stored at -80 
°C. Conditioned medium was stored at 4 °C. 

Partial purification of recombinant heparanase: Partial 
purification of recombinant heparanase was performed by 
heparin-Sepharose column chromatography followed by Superdex 75 
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column gel filtration. Culture medium (150 ml) of Sf21 cells infected with 
pFhpa4 virus was subjected to heparin-Sepharose chromatography. Elution 
of 1 ml fractions was performed with 0.35 - 2 M NaCl gradient in presence 
of 0.1 % CHAPS and 1 mM DTT in 10 mM sodium acetate buffer, pH 5.0. 
A 25 j^il sample of each fraction was tested for heparanase activity. 
Heparanase activity was eluted at the range of 0.65 - 1.1 M NaCl (fractions 
18-26, Figure 10a). 5 jal of each fraction was subjected to 15 % 
SDS-polyacrylamide gel electrophoresis followed by silver nitrate staining. 
Active fractions eluted from heparin-Sepharose (Figure 10a) were pooled 
and concentrated (x 6) on YM3 cut-off membrane. 0.5 ml of the 
concentrated material was applied onto 30 ml Superdex 75 FPLC column 
equilibrated with 10 mM sodium acetate buffer, pH 5.0, containing 0.8 M 
NaCl, 1 mM DTT and 0.1 % CHAPS. Fractions (0.56 ml) were collected at 
a flow rate of 0.75 ml/min. Aliquots of each fraction were tested for 
heparanase activity and were subjected to SDS-polyacrylamide gel 
electrophoresis followed by silver nitrate staining (Figure 1 lb). 

PCR amplification of genomic DNA: 94 °C 3 minutes, followed by 
32 cycles of 94 °C 45 seconds, 64 °C 1 minute, 68 °C 5 minutes, and one 
cycle at 72 °C, 7 minutes. Primers used for amplification of genomic DNA 
included: 

GHpu-L3 5 r -AGGC ACCCTAG AG ATGTTCC AG-3 ' , SEQ ID NO:30 
GHpl-L6 5 7 -G AAG ATTTCTGTTTCC ATG ACGTG-3 ? , SEQ ID NO:31. 
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Screening of genomic libraries: A human genomic library in 
Lambda phage EMBLE3 SP6/T7 (Clontech, Paulo Alto, CA) was screened. 
5 x plaques were plated at 5 x 10 4 pfu/plate on NZCYM agar/top 
agarose plates. Phages were absorbed on nylon membranes in duplicates 
5 (Qiagen). Hybridization was performed at 65 °C in 5 x SSC, 5 x Denhart's, 
10 % dextran sulfate, 100 \ig/m\ Salmon sperm, 32p labeled probe (10^ 
cpm/ml). A 1.6 kb fragment, containing the entire hpa cDNA was labeled 
by random priming (Boehringer Mannheim). Following hybridization 
membranes were washed once with 2 x SSC, 0.1 % SDS at 65 °C for 20 
10 minutes, and twice with 0.2 x SSC, 0.1 % SDS at 65 °C for 15 minutes. 
Hybridizing plaques were picked, and plated at 100 pfu/plate. 
Hybridization was performed as above and single isolated positive plaques 
were picked. 

Phage DNA was extracted using a Lambda DNA extraction kit 
15 (Qiagen). DNA was digested with Xhol and EcoKL, separated on 0.7 % 
agarose gel and transferred to nylon membrane Hybond N+ (Amersham). 
Hybridization and washes were performed as above. 

cDNA Sequence analysis: Sequence determinations were performed 
with vector specific and gene specific primers, using an automated DNA 
20 sequencer (Applied Biosystems, model 373A). Each nucleotide was read 
from at least two independent primers. 
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Genomic sequence analysis: Large-scale sequencing was performed 
by Commonwealth Biotechnology Incorporation. 

Isolation of mouse hpa: Mouse hpa cDNA was amplified from 
either Marathon ready cDNA library of mouse embryo or from mRNA 
isolated from mouse melanoma cell line BL6, using the Marathon RACE kit 
from Clontech. Both procedures were performed according to the 
manufacturer's recommendation. 

Primers used for PCR amplification of mouse hpa: 
Mhpl773 5 ' -CC AC ACTG AATGTAATACTG AAGTG-3 ' , SEQ ID NO:32 
MHpl736 5 '-CGAAGCTCTGGAACTCGGCAAG-3 ', SEQ ID NO:33 
MHpl83 5'-GCCAGCTGCAAAGGTGTTGGAC-3', SEQ IDNO:34 
Mhpll52 5'-AACACCTGCCTCATCACGACTTC-3\ SEQ ID NO:35 
Mhpll 14 5 ' -GCC AGGCTGGCGTCG ATGGTG A-3 ' , SEQ ID NO:36 
MHpll03 5'-GTCGATGGTGATGGACAGGAAC-3', SEQ ID NO:37 
Apl 5'-GTAATACGACTCACTATAGGGC-3', SEQ ID NO:38 - 
(Genome walker) 

Ap2 5'-ACTATAGGGCACGCGTGGT-3', SEQ ID NO:39 - 
(Genome walker) 

Apl 5'-CCATCCTAATACGACTCACTATAGGGC-3\ SEQ ID NO:40 - 
(Marathon RACE) 

Ap2 5 '-ACTC ACTATAGGGCTCGAGCGGC-3 ' , SEQ IDNO:41 - 
(Marathon RACE) 
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Southern analysis of genomic DNA: Genomic DNA was extracted 
from animal or from human blood using Blood and cell culture DNA maxi 
kit (Qiagene). DNA was digested with EcoJU, separated by gel 
electrophoresis and transferred to a nylon membrane Hybond N+ 

5 (Amersham). Hybridization was performed at 68 °C in 6 x SSC, 1 % SDS, 
5 x Denharts, 10 % dextran sulfate, 100 |^g/ml salmon sperm DNA, and 32 p 
labeled probe. A 1.6 kb fragment, containing the entire hpa cDNA was 
used as a probe. Following hybridization, the membrane was washed with 3 
x SSC, 0.1 % SDS, at 68 °C and exposed to X-ray film for 3 days. 

10 Membranes were then washed with 1 x SSC, 0.1 % SDS, at 68 °C and were 
reexposed for 5 days. 

Construction of hpa promoter-GFP expression vector: Lambda 
DNA of phage L3, was digested with Sacl and BgUl, resulting in a 1712 bp 
fragment which contained the hpa promoter (877-2688 of SEQ ID NO:42). 

15 The pEGFP-1 plasmid (Clontech) was digested with BgUl and Sacl and 
ligated with the 1712 bp fragment of the hpa promoter sequence. The 
resulting plasmid was designated phpEGL. A second hpa promoter-GFP 
plasmid was constructed containing a shorter fragment of the hpa promoter 
region: phpEGL was digested with HindlU, and the resulting 1095 bp 

20 fragment (nucleotides 1593-2688 of SEQ ID NO:42) was ligated with 
HindlW digested pEGFP-1 . The resulting plasmid was designated phpEGS. 
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Computer analysis of sequences: Homology searches were 
performed using several computer servers, and various databases. Blast 2.0 
service, at the NCBI server was used to screen the protein database swplus 
and DNA databases such as GenBank, EMBL, and the EST databases. 
5 Blast 2.0 search was performed using the basic search option of the NCBI 
server. Sequence analysis and alignments were done using the DNA 
sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin. Alignments of two sequences 
were performed using Bestfit (gap creation penalty - 12, gap extension 

10 penalty - 4). Protein homology search was performed with the 
Smith-Waterman algorithm, using the Bioaccelerator platform developed by 
Compugene. The protein database swplus was searched using the following 
parameters: gapop: 10.0, gapext: 0.5, matrix: blosum62. Blocks homology 
was performed using the Blocks WWW server developed at Fred 

15 Hutchinson Cancer Research Center in Seattle, Washington, USA. 
Secondary structure prediction was performed using the PHD server - 
Profile network Prediction Heidelberg. Fold recognition (threading) was 
performed using the UCLA-DOE structure prediction server. The method 
used for prediction was gonnet+predss. Alignment of three sequences was 

20 performed using the pileup application (gap creation penalty - 5, gap 
extension penalty - 1). Promoter analysis was performed using TSSW and 
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TSSG programs (BCM Search Launcher Human Genome Center, Baylor 
College of Medicine, Houston TX). 

EXAMPLE 1 

5 Cloning of human hp a cDNA 

Purified fraction of heparanase isolated from human hepatoma cells 
(SK-hep-1) was subjected to tryptic digestion and microsequencing. EST 
(Expressed Sequence Tag) databases were screened for homology to the 
back translated DNA sequences corresponding to the obtained peptides. 

10 Two EST sequences (accession Nos. N41349 and N45367) contained a 
DNA sequence encoding the peptide YGPDVGQPR (SEQ ID NO:8). 
These two sequences were derived from clones 257548 and 260138 
(I.M.A.G.E Consortium) prepared from 8 to 9 weeks placenta cDNA library 
(Soares). Both clones which were found to be identical contained an insert 

15 of 1020 bp which included an open reading frame (ORF) of 973 bp 
followed by a 3' untranslated region of 27 bp and a Poly A tail. No 
translation start site (AUG) was identified at the 5' end of these clones. 

Cloning of the missing 5 f end was performed by PCR amplification 
of DNA from a placenta Marathon RACE cDNA composite. A 900 bp 

20 fragment (designated hp3) ; partially overlapping with the identified 3' 
encoding EST clones was obtained. 



The joined cDNA fragment, 1721 bp long (SEQ ID NO:9), contained 
an open reading frame which encodes, as shown in Figure 1 and SEQ ID 
NO: 1 1 , a polypeptide of 543 amino acids (SEQ ID NO: 10) with a calculated 
molecular weight of 61,192 daltons. The 3' end of the partial cDNA inserts 

5 contained in clones 257548 and 260138 started at nucleotide G 721 of SEQ 
IDNO:9 and Figure 1. 

As further shown in Figure 1, there was a single sequence 
discrepancy between the EST clones and the PCR amplified sequence, 
which led to an amino acid substitution from Tyr 246 in the EST to Phe 246 in 

10 the amplified cDNA. The nucleotide sequence of the PCR amplifiecftrDNA 
fragment was verified from two independent amplification products. The 
new gene was designated hpa. 

As stated above, the 3' end of the partial cDNA inserts contained in 
EST clones 257548 and 260138 started at nucleotide 721 of hpa (SEQ ID 

15 NO:9). The ability of the hpa cDNA to form stable secondary structures, 
such as stem and loop structures involving nucleotide stretches in the 
vicinity of position 721 was investigated using computer modeling. It was 
found that stable stem and loop structures are likely to be formed involving 
nucleotides 698-724 (SEQ ID NO:9). In addition, a high GC content, up to 

20 70 %, characterizes the 5' end region of the hpa gene, as compared to about 
only 40 % in the 3' region. These findings may explain the immature 
termination and therefore lack of 5' ends in the EST clones. 
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To examine the ability of the hpa gene product to catalyze 
degradation of heparan sulfate in an in vitro assay the entire open reading 
frame was expressed in insect cells, using the Baculovirus expression 
system. Extracts of cells, infected with virus containing the hpa gene, 
demonstrated a high level of heparan sulfate degradation activity, while 
cells infected with a similar construct containing no hpa gene had no such 
activity, nor did non-infected cells. These results are further demonstrated 
in the following Examples. 

EXAMPLE 2 
Degradation of soluble ECM-derived HSPG 

Monolayer cultures of High Five cells were infected (72 h, 28 °C) 
with recombinant Bacoluvirus containing the pFasthpa plasmid or with 
control virus containing an insert free plasmid. The cells were harvested 
and lysed in heparanase reaction buffer by three cycles of freezing and 
thawing. The cell lysates were then incubated (18 h, 37 °C) with sulfate 
labeled, ECM-derived HSPG (peak I), followed by gel filtration analysis 
(Sepharose 6B) of the reaction mixture. 

As shown in Figure 2, the substrate alone included almost entirely 
high molecular weight (Mr) material eluted next to V Q (peak I, fractions 
5-20, Kav < 0.35). A similar elution pattern was obtained when the HSPG 
substrate was incubated with lysates of cells that were infected with control 
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virus. In contrast, incubation of the HSPG substrate with lysates of cells 
infected with the hpa containing virus resulted in a complete conversion of 
the high Mr substrate into low Mr labeled degradation fragments (peak II, 
fractions 22-35, 0.5 < Kav < 0.75). 

Fragments eluted in peak II were shown to be degradation products 
of heparan sulfate, as they were (i) 5- to 6-fold smaller than intact heparan 
sulfate side chains (Kav approx. 0.33) released from ECM by treatment with 
either alkaline borohydride or papain; and (ii) resistant to further digestion 
with papain or chondroitinase ABC, and susceptible to deamination by 
nitrous acid (6, 1 1). Similar results (not shown) were obtained with 

Sf21 cells. Again, heparanase activity was detected in cells infected with 
the hpa containing virus (p¥hpa), but not with control virus (pF). This 
result was obtained with two independently generated recombinant viruses. 
Lysates of control not infected High Five cells failed to degrade the HSPG 
substrate. 

In subsequent experiments, the labeled HSPG substrate was 
incubated with medium conditioned by infected High Five or Sf21 cells. 

As shown in Figures 3a-b, heparanase activity, reflected by the 
conversion of the high Mr peak I substrate into the low Mr peak II which 
represents HS degradation fragments, was found in the culture medium of 
cells infected with the pYhpdl or p¥hpa4 viruses, but not with the control 
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pFl or pF2 viruses. No heparanase activity was detected in the culture 
medium of control non-infected High Five or Sf21 cells. 

The medium of cells infected with the p¥hpci4 virus was passed 
through a 50 kDa cut off membrane to obtain a crude estimation of the 
molecular weight of the recombinant heparanase enzyme. As demonstrated 
in Figure 4, all the enzymatic activity was retained in the upper 
compartment and there was no activity in the flow through (<50 kDa) 
material. This result is consistent with the expected molecular weight of the 
hpa gene product. 

In order to further characterize the hpa product the inhibitory effect 
of heparin, a potent inhibitor of heparanase mediated HS degradation (40) 
was examined. 

As demonstrated in Figures 5a-b, conversion of the peak I substrate 
into peak II HS degradation fragments was completely abolished in the 
presence of heparin. 

Altogether, these results indicate that the heparanase enzyme is 
expressed in an active form by insect cells infected with Baculovirus 
containing the newly identified human hpa gene. 
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EXAMPLE 3 
Degradation ofHSPG in intact ECM 

Next, the ability of intact infected insect cells to degrade HS in 
intact, naturally produced ECM was investigated. For this purpose, High 

5 Five or Sf21 cells were seeded on metabolically sulfate labeled ECM 
followed by infection (48 h, 28 °C) with either the p¥hpa4 or control pF2 
viruses. The pH of the medium was then adjusted to pH 6.2-6.4 and the 
cells further incubated with the labeled ECM for another 48 h at 28 °C or 24 
h at 37 °C. Sulfate labeled material released into the incubation medium 

10 was analyzed by gel filtration on Sepharose 6B. 

As shown in Figures 6a-b and 7a-b, incubation of the ECM with cells 
infected with the control pF2 virus resulted in a constant release of labeled 
material that consisted almost entirely (>90%) of high Mr fragments (peak 
I) eluted with or next to V Q . It was previously shown that a proteolytic 

15 activity residing in the ECM itself and/or expressed by cells is responsible 
for release of the high Mr material (6). This nearly intact HSPG provides a 
soluble substrate for subsequent degradation by heparanase, as also 
indicated by the relatively large amount of peak I material accumulating 
when the heparanase enzyme is inhibited by heparin (6, 7, 12, Figure 9). On 

20 the other hand, incubation of the labeled ECM with cells infected with the 
p¥hpa4 virus resulted in release of 60-70% of the ECM-associated 
radioactivity in the form of low Mr sulfate-labeled fragments (peak II, 0.5 
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<Kav< 0.75), regardless of whether the infected cells were incubated with 
the ECM at 28 °C or 37 °C. Control intact non-infected Sf21 or High Five 
' cells failed to degrade the ECM HS side chains. 

In subsequent experiments, as demonstrated in Figures 8a-b, High 
5 Five and Sf21 cells were infected (96 h, 28 °C) with p¥hpa4 or control pFl 
viruses and the culture medium incubated with sulfate-labeled ECM. Low 
Mr HS degradation fragments were released from the ECM only upon 
incubation with medium conditioned by p¥hpa4 infected cells. As shown in 
Figure 9, production of these fragments was abolished in the presence of 

10 heparin. No heparanase activity was detected in the culture medium of 
control, non-infected cells. These results indicate that the heparanase 
enzyme expressed by cells infected with the p¥hpa4 virus is capable of 
degrading HS when complexed to other macromolecular constituents (i.e. 
fibronectin, laminin, collagen) of a naturally produced intact ECM, in a 

15 manner similar to that reported for highly metastatic tumor cells or activated 
cells of the immune system (6, 7). 

EXAMPLE 4 
Purification of recombinant human heparanase 
20 The recombinant heparanase was partially purified from medium of 

p¥hpa4 infected Sf21 cells by Heparin-Sepharose chromatography (Figure 
10a) followed by gel filtration of the pooled active fractions over an FPLC 
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Superdex 75 column (Figure 11a). A ~ 63 kDa protein was observed, 
whose quantity, as was detected by silver stained SDS-polyacrylamide gel 
electrophoresis, correlated with heparanase activity in the relevant column 
fractions (Figures 10b and lib, respectively). This protein was not detected 
in the culture medium of cells infected with the control pF 1 virus and was 
subjected to a similar fractionation on heparin-Sepharose (not shown). 

EXAMPLE 5 

Expression of the human hpa cDNA in various cell types, organs and 

tissues 

Referring now to Figures 12a-e, RT-PCR was applied to evaluate the 
expression of the hpa gene by various cell types and tissues. For this 
purpose, total RNA was reverse transcribed and amplified. The expected 
585 bp long cDNA was clearly demonstrated in human kidney, placenta (8 
and 1 1 weeks) and mole tissues, as well as in freshly isolated and short 
termed (1.5-48 h) cultured human placental cytotrophoblastic cells (Figure 
12a), all known to express a high heparanase activity (41). The hpa 
transcript was also expressed by normal human neutrophils (Figure 12b). In 
contrast, there was no detectable expression of the hpa mRNA in embryonic 
human muscle tissue, thymus, heart and adrenal (Figure 12b). The hpa gene 
was expressed by several, but not all, human bladder carcinoma cell lines 
(Figure 12c), SK hepatoma (SK-hep-1), ovarian carcinoma (OV 1063), 
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breast carcinoma (435, 231), melanoma and megakaryocyte (DAMI, 
CHRF) human cell lines (Figures 12d-e). 

The above described expression pattern of the hpa transcript was 
determined to be in a very good correlation with heparanase activity levels 
5 determined in various tissues and cell types (not shown). 

EXAMPLE 6 

Isolation of an extended 5' end of hpa cDNA from human SK-hepl cell 

line 

10 The 5 ? end of hpa cDNA was isolated from human SK-hepl cell line 

by PCR amplification using the Marathon RACE (rapid amplification of 
cDNA ends) kit (Clontech). Total RNA was prepared from SK-hepl cells 
using the TRI-Reagent (Molecular research center Inc.) according to the 
manufacturer instructions. Poly A+ RNA was isolated using the mRNA 

] 5 separator kit (Clonetech). 

The Marahton RACE SK-hepl cDNA composite was constructed 
according to the manufacturer recommendations. First round of 
amplification was performed using an adaptor specific primer API: 
5'-CCATCCTAATACG ACTCACTATAGGGC-3', SEQ ID NO:l, and a 

20 hpa specific antisense primer hpl-629: 

5'-CCCCAGGAGCAGCAGCATCAG-3 f , SEQ ID NO: 17, corresponding to 
nucleotides 119-99 of SEQ ID NO:9. The resulting PCR product was 
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subjected to a second round of amplification using an adaptor specific 
nested primer AP2: 5 '-ACTCACTATAGGGCTCGAGCGGC-3 SEQ ID 
NO:3, and a hpa specific antisense nested primer hpl-666 
5'-AGGCTTCGAGCGCAGCAGCAT-3', SEQ ID NO: 18, corresponding to 
nucleotides 83-63 of SEQ ID NO:9. The PCR program was as follows: a 
hot start of 94 °C for 1 minute, followed by 30 cycles of 90 °C - 30 seconds, 
68 °C - 4 minutes. The resulting 300 bp DNA fragment was extracted from 
an agarose gel and cloned into the vector pGEM-T Easy (Promega). The 
resulting recombinant plasmid was designated pHPSKl . 

The nucleotide sequence of the pHPSKl insert was determined and it 
was found to contain 62 nucleotides of the 5' end of the placenta hpa cDNA 
(SEQ ID NO:9) and additional 178 nucleotides upstream, the first 178 
nucleotides of SEQ ID NOs:13 and 15. 

A single nucleotide discrepancy was identified between the SK-hepl 
cDNA and the placenta cDNA. The "T" derivative at position 9 of the 
placenta cDNA (SEQ ID NO:9), is replaced by a "C" derivative at the 
corresponding position 187 of the SK-hepl cDNA (SEQ ID NO: 13). 

The discrepancy is likely to be due to a mutation at the 5' end of the 
placenta cDNA clone as confirmed by sequence analysis of sevsral 
additional cDNA clones isolated from placenta, which like the SK-hepl 
cDNA contained C at position 9 of SEQ ID NO:9. 
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The 5' extended sequence of the SK-hepl hpa cDNA was assembled 
with the sequence of the hpa cDNA isolated from human placenta (SEQ ID 
NO:9). The assembled sequence contained an open reading frame which 
encodes, as shown in SEQ ID NOs:14 and 15, a polypeptide of 592 amino 
5 acids with a calculated molecular weight of 66,407 daltons. The open 
reading frame is flanked by 93 bp 5' untranslated region (UTR). 



EXAMPLE 7 

Isolation of the upstream genomic region of the hpa gene 
10 The upstream region of the hpa gene was isolated using the Genome 

Walker kit (Clontech) according to the manufacturer recommendations. 
The kit includes five human genomic DNA samples each digested with a 
different restriction endonuclease creating blunt ends: EcoKV, Seal, Dra\, 
PvwII and Ssp\. 

15 The blunt ended DNA fragments are ligated to partially single 

stranded adaptors. The Genomic DNA samples were subjected to PCR 
amplification using the adaptor specific primer and a gene specific primer. 
Amplification was performed with Expand High Fidelity (Boehringer 
Mannheim). 

20 A first round of amplification was performed using the apl primer: 

5'-G TAATACGACTCACTATAGGGC-3', SEQ ID NO: 19, and the hpa 
specific antisense primer hpl-666: 



5'-AGGCTTCGAGCGCAGCAGCAT-3\ SEQ ID NO:18, corresponding 
to nucleotides 83 - 63 of SEQ ID NO:9. The PCR program was as follows: 
a hot start of 94 °C - 3 minutes, followed by 36 cycles of 94 °C - 40 
seconds, 67 °C - 4 minutes. 

5 The PCR products of the first amplification were diluted 1:50. One 

pi of the diluted sample was used as a template for a second amplification 
using a nested adaptor specific primer ap2: 
S'-ACTATAGGGCACGCGTGGT-S', SEQ ID NO:20 5 and a hpa specific 
antisense primer hpl-690, 5 f -CTTGGGCTCACC TGGCTGCTC-3', SEQ ID 

10 NO:21, corresponding to nucleotides 62-42 of SEQ ID NO:9. The resulting 
amplification products were analyzed using agarose gel electrophoresis. 
Five different PCR products were obtained from the five amplification 
reactions. A DNA fragment of approximately 750 bp which was obtained 
from the Ssp\ digested DNA sample was gel extracted. The purified 

15 fragment was ligated into the plasmid vector pGEM-T Easy (Promega). 
The resulting recombinant plasmid was designated pGHP6905 and the 
nucleotide sequence of the hpa insert was determined. 

A partial sequence of 594 nucleotides is shown in SEQ ID NO: 16. 
The last nucleotide in SEQ ID NO: 13 corresponds to nucleotide 93 in SEQ 

20 ID:13. The DNA sequence in SEQ ID NO:16 contains the 5' region of the 
hpa cDNA and 501 nucleotides of the genomic upstream region which are 
predicted to contain the promoter region of the hpa gene. 
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EXAMPLE 8 

Expression of the 592 amino acids HPA polypeptide in a human 293 cell 

line 

The 592 amino acids open reading frame (SEQ ID NOs:13 and 15) 

5 was constructed by ligation of the 1 10 bp corresponding to the 5' end of the 
SK-hepl hpa cDNA with the placenta cDNA. More specifically the 
Marathon RACE - PCR amplification product of the placenta hpa DNA was 
digested with Sac\ and an approximately 1 kb fragment was ligated into a 
Sad-digested pGHP6905 plasmid. The resulting plasmid was digested with 

10 Earl and Aatll. The Earl sticky ends were blunted and an approximately 
280 bp Earlfb\\mt-Aatll fragment was isolated. This fragment was ligated 
with pFasthpa digested with EcoRl which was blunt ended using Klenow 
fragment and further digested with Aatll. The resulting plasmid contained a 
1827 bp insert which includes an open reading frame of 1776 bp, 31 bp of 

15 3' UTR and 21 bp of 5' UTR. This plasmid was designated pFastL/zpa. 

A mammalian expression vector was constructed to drive the 
expression of the 592 amino acids heparanase polypeptide in human cells. 
The hpa cDNA was excised prom pFastL/zpa with BssUll and Noil. The 
resulting 1850 bp BssHll-Notl fragment was ligated to a mammalian 

20 expression vector pSI (Promega) digested with MM and Noil. The 
resulting recombinant plasmid, pSI/z/?aMet2 was transfected into a human 
293 embryonic kidney cell line. 
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Transient expression of the 592 amino-acids heparanase was 
examined by western blot analysis and the enzymatic activity was tested 
using the gel shift assay. Both these procedures are described in length in 
U.S. Pat. application No. 09/071,739, filed May 1, 1998, which is 

5 incorporated by reference as if fully set forth herein. Cells were harvested 3 
days following transfection. Harvested cells were re-suspended in lysis 
buffer containing 150 mM NaCl, 50 mM Tris pH 7.5, 1% Triton X-100, 1 
mM PMSF and protease inhibitor cocktail (Boehringer Mannheim). 40 
protein extract samples vvere used for separation on a SDS-PAGE. Proteins 

10 were transferred onto a PVDF Hybond-P membrane (Amersham). The 
membrane was incubated with an affinity purified polyclonal anti 
heparanase antibody, as described in U.S. Pat. application No. 09/071,739. 
A major band of approximately 50 kDa was observed in the transfected 
cells as well as a minor band of approximately 65 kDa. A similar pattern 

15 was observed in extracts of cells transfected with the pShpa as 
demonstrated in U.S. Pat. application No. 09/071,739. These two bands 
probably represent two forms of the recombinant heparanase protein 
produced by the transfected cells. The 65 kDa protein probably represents a 
heparanase precursor, while the 50 kDa protein is suggested herein to be the 

20 processed or mature form. 

The catalytic activity of the recombinant protein expressed in the 
pShpaMet2 transfected cells was tested by gel shift assay. Cell extracts of 
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transfected and of mock transfected cells were incubated overnight with 
heparin (6 pg in each reaction) at 37 °C 5 in the presence of 20 mM 
phosphate citrate buffer pH 5.4, 1 mM CaCl2, 1 mM DTT and 50 mM 
NaCl. Reaction mixtures were then separated on a 10 % polyacrylariiide 
gel. The catalytic activity of the recombinant heparanase was clearly 
demonstrated by a faster migration of the heparin molecules incubated with 
the transfected cell extract as compared to the control. Faster migration 
indicates the disappearance of high molecular weight heparin molecules and 
the generation of low molecular weight degradation products. 

EXAMPLE 9 
Chromosomal localization of the hpa gene 
Chromosomal mapping of the hpa gene was performed utilizing a 
panel of monochromosomal human/CHO and human/mouse somatic cell 
hybrids, obtained from the UK HGMP Resource Center (Cambridge, 
England). 

40 ng of each of the somatic cell hybrid DNA samples were 
subjected to PCR amplification using the hpa primers: hpu565 
5'-AGCTCTGTAGATGTGC TATACAC-3 1 , SEQ ID NO:22, 
corresponding to nucleotides 564-586 of SEQ ID NO:9 and an antisense 
primer hpl 1 7 1 5'-GCATCTTAGCCGTCTTTCTTCG-3', SEQ ID NO:23 7 
corresponding to nucleotides 897-876 of SEQ IDNO:9. 
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The PCR program was as follows: a hot start of 94 °C - 3 minutes, 
followed by 7 cycles of 94 °C - 45 seconds, 66 °C - 1 minute, 68 °C - 5 
minutes, followed by 30 cycles of 94 °C - 45 seconds, 62 °C - 1 minute, 68 
°C - 5 minutes, and a 10 minutes final extension at 72 °C. 

5 The reactions were performed with Expand long PCR (Boehringer 

Mannheim). The resulting amplification products were analyzed using 
agarose gel electrophoresis. As demonstrated in Figure 14, a single band of 
approximately 2.8 Kb was obtained from chromosome 4, as well as from the 
control human genomic DNA. A 2.8 kb amplification product is expected 

10 based on amplification of the genomic hpa clone (data not shown). No 
amplification products were obtained neither in the control DNA samples of 
hamster and mouse nor in somatic hybrids of other human chromosome. 



EXAMPLE 10 

1 5 Human genomic clone encoding heparanase 

Five plaques were isolated following screening of a human genomic 
library and were designated L3-1, L5-1, L8-1, LI 0-1 and L6-1. The phage 
DNAs were analyzed by Southern hybridization and by PCR with hpa 
specific and vector specific primers. Southern analysis was performed with 

20 three fragments of hpa cDNA: a PvulI-BamHl fragment (nucleotides 
32-450, SEQ ID NO:9), a BamHl-Ndel fragment (nucleotides 451-1102, 



SEQ ID NO:9) and an Ndel-Xhol fragment (nucleotides 1103-1721, SEQ 
ID NO:9). 

Following Southern analysis, phages L3, L6, L8 were selected for 
further analysis. A scheme of the genomic region and the relative position 
5 of the three phage clones is depicted in Figure 15. A 2 kb DNA fragment 
containing the gap between phages L6 and L3 was PCR amplified from 
human genomic DNA with two gene specific primers GHpuL3 and 

GHplL6. The PCR product was cloned into the plasmid vector 

.... v 
pGEM-T-easy (Promega). 

10 Large scale DNA sequencing of the three Lambda clones and the 

amplified fragment was performed with Lambda purified DNA by primer 
walking. A nucleotide sequence of 44,898 bp was analyzed (Figure 16 ? 
SEQ ID NO:42). Comparison of the genomic sequence with that of hpa 
cDNA revealed 12 exons separated by 11 introns (Figures 15 an 16). The 

15 genomic organization of the hpa gene is depicted in Figure 15 (top). The 
sequence include the coding region from the first ATG to the stop codon 
which spans 39.113 nucleotides, 2742 nucleotides upstream of the first 
ATG and 3043 nucleotides downstream of the stop codon. Splice site 
consensus sequences were identified at exon/intron junctions. 
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EXAMPLE 11 
Alternative splicing 

Several minor RT-PCR products were obtained from various cell 
types, following amplification with hpa specific primers. Each one found to 
contain a deletion of one or two exons. Some of these PCR products 
contain ORFs, which encode potential shorter proteins. 

Table 1 below summarizes the alternative spliced products isolated 
from various cell lines. 

Fragments of similar sizes were obtained following amplification 
with two cell lines, placenta and platelets. 

Cell type Nucleotides deleted Exons deleted ORF 

Platelets 1047-1267 8/9 +~ 

Platelets 1154-1267 9 

Platelets 289-435,562-735 2,4 

Sk-hepl, platelets, Zr75 562-735 4 + 

Sk-hepl (hepatoma) 561-904 4,5 

Zr75 (breast carcinoma) 96-203 1 (partial) + 



EXAMPLE 12 
Mouse and rat hpa 
EST databases were screened for sequences homologous to the hpa 
gene. Three mouse EST's were identified (accession No. Aa 177901, from 
mouse spleen, Aa067997 from mouse skin, Aa47943 from mouse embryo), 
assembled into a 824 bp cDNA fragment which contains a partial open 
reading frame (lacking a 5' end) of 629 bp and a 3' untranslated region of 
195 bp (SEQ ID NO: 12). As shown in Figure 13, the coding region is 80 % 
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similar to the 3' end of the hpa cDNA sequence. These EST's are probably 
cDNA fragments of the mouse hpa homolog that encodes for the mouse 
heparanase. 

Searching for consensus protein domains revealed an amino terminal 
homology between the heparanase and several precursor proteins such as 
Procollagen Alpha 1 precursor, Tyrosine-protein kinase-RYK, Fibulin-1, 
Insulin-like growth factor binding protein and several others. The amino 
terminus is highly hydrophobic and contains a potential trans-membrane 
domain. The homology to known signal peptide sequences suggests that it 
could function as a signal peptide for protein localization. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 
human EST's were identified, as well as mouse sequences highly 
homologous to human heparanase. The following mouse EST's were 
identified AA 177901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 
cDNA was cloned, based on the nucleotide sequence of the mouse EST's. 
PCR primers were designed and a Marathon RACE was performed using a 
Marathon cDNA library from 15 days mouse embryo (Clontech) and from 
BL6 mouse melanoma cell line. The mouse hpa homologous cDNA was 
isolated following several amplification steps. A 1.1 kb fragment was 
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amplified from mouse embryo Marathon cDNA library. The first cycle of 
amplification was performed with primers mhpl773 and Apl and the second 
cycle with primers mhpl736 and AP2. A 1.1 kb fragment was then 
amplified from BL6 Marathon cDNA library. The first cycle of 

5 amplification was performed with the primers mhpll52 and Apl, and the 
second with mhpl83 and AP2. The combined sequence was homologous to 
nucleotides 157 - 1702 of the human hpa cDNA, which encode amino acids 
33-543. The 5' end of the mouse hpa gene was isolated from a mouse 
genomic DNA library using the Genome Walker kit (Clontech). An 0.9 kb 

10 fragment was amplified from a Dra\ digested Genome walker DNA library. 
The first cycle of amplification was performed with primers mhpllH and 
Apl and the second with primers mhpl!03 and AP2. The assembled 
sequence (SEQ ID NOs:43, 45) is 2396 nucleotides long. It contains an 
open reading frame of 1605 nucleotides, which encode a polypeptide of 535 

15 amino acids (SEQ ID NOs:44, 45). 196 nucleotides of 3' untranslated 
region (UTR). and anupstream sequence which includes the promoter 
region and the 5'-UTR of the mouse hpa cDNA.. According to two 
promoter predicting programs TSSW and TSSG, the transcription start site 
is localized to nucleotide 431 of SEQ ID NOs:43, 45, 163 nucleotides 

20 upstream of the first ATG codon. The 431 upstream genomic sequence 
contains the promoter region. A TATA box is predicted at position 394 of 
SEQ ID NOs:43, 45. The mouse and the human hpa genes share an 



average homology of 78 % between the nucleotide sequences and 81 % 
similarity between the deduced amino acid sequences. 

Search for hpa homologous sequences, using the Blast 2.0 server 
revealed two EST's from rat: AI060284 (385 nucleotides, SEQ ID NO:46) 

5 which is homologous to the amino terminus (68 % similarity to amino acids 
12-136) of human heparanase and AI237828 (541 nucleotides, SEQ ID 
NO:47) which is homologous to the carboxyl terminus (81 % similarity to 
amino acids 500-543) of human heparanase, and contains a 3'-UTR. A 
comparison between the human heparanase and the mouse and rat 

10 homologous sequences is demonstrated in Figure 17. 

EXAMPLE 13 
Prediction of heparanase active site 
Homology search of heparanase amino acid sequence against the 
15 DNA and the protein databases revealed no significant homologies. The 
protein secondary structure as predicted by the PHD program consists of 
alternating alpha helices and beta sheets. The fold recognition server of 
UCLA predicted alpha/beta barrel structure, with under-threshold 
confidence. 

20 Five of 15 proteins, which were predicted to have most similar folds, 

were glycosyl hydrolases from various organisms: lxyza — xylanase from 
Clostridium Thermocellum, lpbga - 6-phospho-beta-5-galactosidase from 
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Lactococcus Lactis, lamy - alpha-amylase from Barley, lecea - 
endocellulase from Acidothermus Cellulolyticus and Iqbc - 
hexosaminidase alpha chain, glycosyl hydrolase. 

Protein homology search using the bioaccelerator pulled out several 
proteins, including glycosyl hydrolyses such as beta-fructofuranosidase 
from Vicia faba (broad bean) and from potato, lactase phlorizin hydrolase 
from human, xylanases from Clostridium thermocellum and from 
Streptomyces halstedii and cellulase from Clostridium thermocellum. 
Blocks 9.3 database pulled out the active site of glycosyl hydrolases family 
five, which includes cellulases from various bacteria and fungi. Similar 
active site motif is shared by several lysosomal acid hydrolases (63) and 
other glycosyl hydrolases. The common mechanism shared by these 
enzymes involves two glutamic acid residues, a proton donor and a 
nucleophile. 

Despite the lack of an overall homology between the heparanase and 
other glycosyl hydolases, the amino acid couple Asp-Glu (NE), which is 
characteristic of the proton donor of glycosyl hydrolyses of the GH-A clan, 
was found at positions 224-225 of the human heparanase protein sequence. 
As in other clan members, this NE couple is located at the end of a (5 sheet. 

Considering the relative location of the proton donor and the 
predicted secondary structure, the glutamic acid that functions as 
nucleophile is most likely located at position 343. or at positon 396. 



Identification of the active site and the amino acids directly involved in 
hydrolysis opens the way for expression of the defined catalytic domain. In 
addition, it will provide the tools for rational design of enzyme activity 
either by modification of the microenviroment or catalytic site itself. 

5 

EXAMPLE 14 
Expression of hpa antisense in mammalian cell lines 
A mammalian expression vector Hpa2Kepcdna3 was constructed in 
order to express hpa antisense in mammalian cells, hpa cDNA (1.7 kb 
10 EcoKl fragment) was cloned into the plasmid pCDNA3 in 3 5 >5' (antisense) 
orientation. The construct was used to transfect MBT2-T50 and T24P cell 
lines. 2 x 10^ cells in 35 mm plates were transfected using the Fugene 
protocol (Boehringer Mannheim). 48 hours after transfection cells were 
trypsinized and seeded in six well plates. 24 hours later G418 was added to 
15 initiate selection. The number of colonies per 35 mm plate following 3 
weeks: 



Antisense No insert 
T24P 1 5 60 

20 MBT-T50 1 6 
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The lower number of colonies obtained after transfection with hpa 
antisense, as compared with the control plasmid suggests that the 
introduction of hpa antisense interfere with cell growth. This experiment 
demonstrates the use of complementary antisense hpa DNA sequence to 
control heparanase expression in cells. This approach may be used to 
inhibit expression of heparanase in vivo, in, for example, cancer cells and in 
other pathological processes in which heparanase is involved. 

EXAMPLE 15 
Zoo blot 

Hpa cDNA was used as a probe to detect homologous sequences in 
human DNA and in DNA of various animals. The autoradiogram of the 
Southern analysis is presented in Figure 18. Several bands were detected in 
human DNA, which correlated with the accepted pattern according to the 
genomic hpa sequence. Several intense bands were detected in all 
mammals, while faint bands were detected in chicken. This correlates with 
the phylogenetic relation between human and the tested animals. The 
intense bands indicate that hpa is conserved among mammals as well as in 
more genetically distant organisms. The multiple bands patterns suggest 
that in all animals, like in human, the hpa locus occupy large genomic 
region. Alternatively, the various bands could represent homologous 
sequences and suggest the existence of a gene family, which can be isolated 



108 



based on their homology to the human hpa reported herein. This 
conservation was actually found, between the isolated human hpa cDNA 
and the mouse homologue. 

5 EXAMPLE 16 

Characterization of the hpa promoter 

The DNA sequence upstream of the hpa first ATG was subjected to 
computational analysis in order to localize the predicted transcription start 
site and to identify potential transcription factors binding sites. Recognition 

10 of human PolII promoter region and start of transcription were predicted 
using the TSSW and TSSG programs. Both programs identified a promoter 
region upstream of the coding region. TSSW pointed at nucleotide 2644 
and TSSG at 2635 of SEQ ID NO:42. These two predicted transcription 
start sites are located 4 and 13 nucleotides upstream of the longest hpa 

1 5 cDNA isolated by RACE. 

A hpa promoter-GFP reporter vector was constructed in order to 
investigate the regulation of hpa transcription. Two constructs were made, 
containing 1.8 kb and 1.1 kb of the hpa promoter region. The reporter 
vector was transfected into T50-mouse bladder carcinoma cells. Cells 

20 transfected with both constructs exhibited green fluorescence, which 
indicated the promoter activity of the genomic sequence upstream of the 
/zptf-coding region. This reporter vector, enables the monitoring of hpa 
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promoter activity, at various conditions and in different cell types and to 
characterize the factors involved regulation of hpa expression. 

Although the invention has been described in conjunction with 
5 specific embodiments thereof, it is evident that many alternatives, 
modifications and variations will be apparent to those skilled in the art. 
Accordingly, it is intended to embrace all such alternatives, modifications 
and variations that fall within the spirit and broad scope of the appended 
claims. 
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SEQUENCE LISTING 



GENERAL INFORMATION: 



(i) 



(ii) 



(iii) 
(iv) 



(v) 



(vi) 



(vii) 



APPLICANT: 



TITLE OF INVENTION: 



NUMBER OF SEQUENCES: 
CORRESPONDENCE ADDRESS: 



(A) 
(B) 
(C) 
(D) 
(E) 

(F) 



ADDRESSEE: 
STREET: 
CITY : 
STATE : 
COUNTRY: 

ZIP: 



Iris Pecker, Israel Vlodavsky and Elena 
Feinstein 

POLYNUCLEOTIDE ENCODING A POLYPEPTIDE 
HAVING HEPARANASE ACTIVITY AND EXPRESSION 
OF SAME IN GENETICALLY MODIFIED CELLS 
47 



Mark M . Friedman c/o Anthony Castorina 
2001 Jefferson Davis Highway, Suite 207 
Arlington 
Virginia 

United States of America 



22202 



( viii) 



(ix) 



COMPUTER READABLE FORM : 

(A) MEDIUM TYPE: 

(B) COMPUTER: 

(C) OPERATING SYSTEM: 

{ D) SOFTWARE: 

CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 

<B) FILING DATE: 

(C) CLASSIFICATION: 
PRIOR APPLICATION DATA: 



1.44 megabyte, 3.5" microdisk 
Twinhead* Slimnote-890TX 
MS DOS version 6.2, 
Windows version 3.11 

Word for Windows version 2.0 converted to 
an ASCI file 



(A) 


APPLICATION 


NUMBER: 


08/922, 170 


(B) 


FILING DATE: 




2 SEP 1997 


(A) 


APPLICATION 


NUMBER: 


09/109, 386 


(B> 


FILING DATE: 




10 JUL 1998 


(A) 


APPLICATION 


NUMBER : 


PCT/US98/17954 


(B) 


FILING DATE: 




31 AUG 1998 



ATTORNEY /AGENT INFORMATION: 

(A) NAME: 

(B) REGISTRATION NUMBER: 

<C) REFERENCE /DOCKET NUMBER : 

TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE: 

(B) TELEFAX: 

(C) TELEX: 



Friedmam, Mark M. 

33, 883 

910/14 

972-3-5625553 
972-3-5625554 



INFORMATION FOR SEQ ID NO:l: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 

TYPE: 

S T PA NDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



27 

nucleic acid 

single 

linear 

SEQ ID NO: 1 : 



CCATCCTAAT ACGACTCACT ATAGGGC 2 7 



INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 4 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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GTAGTGATGC CATGTAACTG AATC 24 



INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACTCACTATA GGGCTCGAGC GGC 23 



INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GCATCTTAGC CGTCTTTCTT CG 22 



INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

TTCGATCCCA AGAAGGAATC AAC 2 3 

INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 4 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

GTAGTGATGC CATGTAACTG AATC 2 4 

INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Tyr Gly Pro Asp Val Gly Gin Pro Arg 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1721 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



CTAGAGCTTT 


CGACTCTCCG 


CTGCGCGGCA 


GCTGGCGGGG 


GGAGCAGCCA 


GGTGAGCCCA 


60 


AGATGCTGCT 


GCGCTCGAAG 


CCTGCGCTGC 


CGCCGCCGCT 


GATGCTGCTG 


CTCCTGGGGC 


120 


CGCTGGGTCC 


CCTCTCCCCT 


GGCGCCCTGC 


CCCGACCTGC 


GCAAGCACAG 


GACGTCGTGG 


180 


ACCTGGACTT 


CTTCACCCAG 


GAGCCGCTGC 


ACCTGGTGAG 


CCCCTCGTTC 


CTGTCCGTCA 


240 


CCATTGACGC 


CAACCTGGCC 


ACGGACCCGC 


GGTTCCTCAT 


CCTCCTGGGT 


TCTCCAAAGC 


300 


TTCGTACCTT 


GGCCAGAGGC 


TTGTCTCCTG 


CGTACCTGAG 


GTTTGGTGGC 


ACCAAGACAG 


360 


ACTTCCTAAT 


TTTCGATCCC 


AAGAAGGAAT 


CAACCTTTGA 


AGAGAGAAGT 


TACTGGCAAT 


420 


CTCAAGTCAA 


CCAGGATATT 


TGCAAATATG 


GATCCATCCC 


TCCTGATGTG 


GAGGAGAAGT 


480 


TACGGTTGGA 


ATGGCCCTAC 


CAGGAGCAAT 


TGCTACTCCG 


AGAACACTAC 


CAGAAAAAGT 


540 


TCAAGAACAG 


CACCTACTCA 


AGAAGCTCTG 


TAGATGTGCT 


ATACACTTTT 


GCAAACTGCT 


600 


CAGGACTGGA 


CTTGATCTTT 


GGCCTAAATG 


CGTTATTAAG 


AACAGCAGAT 


TTGCAGTGGA 


660 


ACAGTTCTAA 


TGCTCAGTTG 


CTCCTGGACT 


ACTGCTCTTC 


CAAGGGGTAT 


AACATTTCTT 


720 


GGGAACTAGG 


CAATGAACCT 


AACAGTTTCC 


TTAAGAAGGC 


TGATATTTTC 


ATCAATGGGT 


780 


CGCAGTTAGG 


AGAAGATTAT 


ATTCAATTGC 


ATAAACTTCT 


AAGAAAGTCC 


ACCTTCAAAA 


840 


ATGCAAAACT 


CTATGGTCCT 


GATGTTGGTC 


AGCCTCGAAG 


AAAGACGGCT 


AAGATGCTGA 


900 


AGAGCTTCCT 


GAAGGCTGGT 


GGAGAAGTGA 


TTGATTCAGT 


TACATGGCAT 


CACTACTATT 


960 


TGAATGGACG 


GACTGCTACC 


AGGGAAGATT 


TTCTAAACCC 


TGATGTATTG 


GACATTTTTA 


1020 


TTTCATCTGT 


GCAAAAAGTT 


TTCCAGGTGG 


TTGAGAGCAC 


CAGGCCTGGC 


AAGAAGGTCT 


1080 


GGTTAGGAGA 


AACAAGCTCT 


GCATATGGAG 


GCGGAGCGCC 


CTTGCTATCC 


GACACCTTTG 


1140 


CAGCTGGCTT 


TATGTGGCTG 


GATAAATTGG 


GCCTGTCAGC 


CCGAATGGGA 


ATAGAAGTGG 


1200 


TGATGAGGCA 


AGTATTCTTT 


GGAGCAGGAA 


ACTACCATTT 


AGTGGATGAA 


AACTTCGATC 


1260 


CTTTACCTGA 


TTATTGGCTA 


TCTCTTCTGT 


TCAAGAAATT 


GGTGGGCACC 


AAGGTGTTAA 


1320 


TGGCAAGCGT 


GCAAGGTTCA 


AAGAGAAGGA 


AGCTTCGAGT 


ATACCTTCAT 


TGCACAAACA 


1380 


CTGACAATCC 


AAGGTATAAA 


GAAGGAGATT 


TAACTCTGTA 


TGCCATAAAC 


CTCCATAACG 


1440 


TCACCAAGTA 


CTTGCGGTTA 


CCCTATCCTT 


TTTCTAACAA 


GCAAGTGGAT 


AAATACCTTC 


1500 


TAAGACCTTT 


GGGACCTCAT 


GGATTACTTT 


CCAAATCTGT 


CCAACTCAAT 


GGTCTAACTC 


1560 


TAAAGATGGT 


GGATGATCAA 


ACCTTGCCAC 


CTTTAATGGA 


AAAACCTCTC 


CGGCCAGGAA 


1620 


GTTCACTGGG 


CTTGCCAGCT 


TTCTCATATA 


GTTTTTTTGT 


GATAAGAAAT 


GCCAAAGTTG 


1680 


CTGCTTGCAT 


CTGAAAATAA 


AATATACTAG 


TCCTGACACT 


G 




1721 



( 2 ) IN FORMAT I ON FOR S EQ ID NO : 1 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 3 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
«D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro Pro Leu Met Leu Leu 
5 10 15 

Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro Gly Ala Leu Pro Arg Pro 
20 25 30 

Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe Phe Thr Gin Glu Pro 
35 40 45 

Leu His Leu Val Ser Pro Ser Phe Leu Ser Val Thr lie Asp Ala Asn 
50 55 60 



Leu Ala Thr Asp Pro Arg Phe Leu lie Leu Leu Gly Ser Pro Lys Leu 



65 



70 



130 

75 



80 



Arg Thr Leu Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly 
85 90 95 

Thr Lys Thr Asp Phe Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe 
100 105 110 

Glu Glu Arg Ser Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys 
115 120 125 

Tyr Gly Ser He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp 
130 135 140 

Pro Tyr Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe 
145 150 155 160 

Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 
165 170 175 

Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu Leu 
180 185 190 

Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu Leu Leu 
195 200 205 

Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu Leu Gly Asn 
210 215 220 

Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe He Asn Gly Ser 
225 230 235 240 

Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys Leu Leu Arg Lys Ser 
245 250 255 

Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp Val Gly Gin Pro Arg 
260 265 270 

Arg Lys Thr Ala Lys Met Leu Lys Ser Phe Leu Lys Ala Gly Gly Glu 
275 280 285 

Val lie Asp Ser Val Thr Trp His His Tyr Tyr Leu Asn Gly Arg Thr 
290 ^ , 295 300 

Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp Val Leu Asp lie Phe He 
305 310 315 320 

Ser Ser Val Gin Lys Val Phe Gin Val Val Glu Ser Thr Arg Pro Gly 
325 330 335 

Lys Lys Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala 
340 345 350 



Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
355 360 365 



•Leu Giy Leu Ser Ala Arg Met Gly lie Glu Vai Val Met Arg Gin Val 
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370 375 380 

Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro 
385 390 395 400 

Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 
405 410- 415 

Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu Arg 
420 425 430 

Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 
435 440 445 

Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn Val Thr Lys Tyr Leu 
450 455 460 

Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 
465 470 475 480 

Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys Ser Val Gin Leu Asn 
485 490 495 

Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr Leu Pro Pro Leu Met 
500 505 510 

Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Ser 
515 520 525 

Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys lie 
530 535 540 543 

(2) INFORMATION FOR SEQ ID NO : 1 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1721 

(B) TYPE: nucleic acid 

(C) STRAW DEDN ESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ IDNOill: 

CT AGA GCT TTC GAC 14 

TCT CCG CTG CGC GGC AGC TGG CGG GGG GAG CAG CCA GGT GAG CCC AAG 62 

ATG CTG CTG CGC TCG AAG CCT GCG CTG CCG CCG CCG CTG ATG CTG CTG 110 
Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro Pro Leu Met Leu Leu 
5 10 15 

CTC CTG GGG CCG CTG GGT CCC CTC TCC CCT GGC GCC CTG CCC CGA CCT 158 
Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro Gly Ala Leu Pro Arg Pro 
20 25 30 

GCG CAA GCA CAG GAC GTC GTG GAC CTG GAC TTC TTC ACC CAG GAG CCG 2 06 
Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe Phe Thr Gin Glu Pro 
35 40 45 

CTG CAC CTG GTG AGC CCC TCG TTC CTG TCC GTC ACC ATT GAC GCC A AC 2b4 
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Leu His Leu Val Ser Pro Ser Phe Leu Ser Val Thr lie Asp Ala Asn 
50 55 60 



CTG GCC ACG GAC CCG CGG TTC CTC ATC CTC CTG GGT TCT CCA AAG CTT 302 
Leu Ala Thr Asp Pro Arg Phe Leu He Leu Leu Gly Ser Pro Lys Leu 
65 70 75 80 

CGT ACC TTG GCC AGA GGC TTG TCT CCT GCG TAC CTG AGG TTT GGT GGC 350 
Arg Thr Leu Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly 
85 90 95 

ACC AAG ACA GAC TTC CTA ATT TTC GAT CCC AAG AAG GAA TCA ACC TTT 398 
Thr Lys Thr Asp Phe Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe 
100 105 110 

GAA GAG AGA AGT TAC TGG CAA TCT CAA GTC AAC CAG GAT ATT TGC AAA 4 4 6 
Glu Glu Arg Ser Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys 
115 120 125 

TAT GGA TCC ATC CCT CCT GAT GTG GAG GAG AAG TTA CGG TTG GAA TGG 4 94 
Tyr Gly Ser He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp 
130 135 140 

CCC TAC CAG GAG CAA TTG CTA CTC CGA GAA CAC TAC CAG AAA AAG TTC 54 2 
Pro Tyr Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe 
145 150 155 160 

AAG AAC AGC ACC TAC TCA AGA AGC TCT GTA GAT GTG CTA TAC ACT TTT 590 
Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 
165 170 175 

GCA AAC TGC TCA GGA CTG GAC TTG ATC TTT GGC CTA AAT GCG TTA TTA 63 8 
Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu Leu 
180 185 190 

AGA ACA GCA GAT TTG CAG TGG AAC AGT TCT AAT GCT CAG TTG CTC CTG 68 6 
Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu Leu Leu 
195 200 205 

GAC TAC TGC TCT TCC AAG GGG TAT AAC ATT TCT TGG GAA CTA GGC AAT 7 34 
Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu Leu Gly Asn 
210 215 220 

GAA CCT AAC AGT TTC CTT AAG AAG GCT GAT ATT TTC ATC AAT GGG TCG 7 82 
Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp lie Phe He Asn Gly Ser 
225 230 235 240 

CAG TTA GGA GAA GAT TAT ATT CAA TTG CAT AAA CTT CTA AGA AAG TCC 8 30 
Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys Leu Leu Arg Lys Ser 
245 250 255 

ACC TTC AAA AAT GCA AAA CTC TAT GGT CCT GAT GTT GGT CAG CCT CGA 87 8 
Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp Val Gly Gin Pro Arg 
260 265 270 

AGA AAG ACG GCT AAG ATG CTG AAG AGC TTC CTG AAG GCT GGT GGA GAA 92 6 
Arg Lys Thr Aia Lys :•■£- Leu Lys Ser Fhe Leu Lys Ala Gly Gly Glu 
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275 280 285 

GTG ATT GAT TCA GTT AC A TGG CAT CAC TAC TAT TTG AAT GGA CGG ACT 97 4 
Val lie Asp Ser Val Thr Trp His His Tyr Tyr Leu Asn Gly Arg Thr 
290 295 300 

GCT ACC AGG GAA GAT TTT CTA AAC CCT GAT GTA TTG GAC ATT TTT ATT 1022 
Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp Val Leu Asp lie Phe lie 
305 310 315 320 

TCA TCT GTG CAA AAA GTT TTC CAG GTG GTT GAG AGC ACC AGG CCT GGC 1070 
Ser Ser Val Gin Lys Val Phe Gin Val Val Glu Ser Thr Arg Pro Gly 
325 330 335 

AAG AAG GTC TGG TTA GGA GAA ACA AGC TCT GCA TAT GGA GGC GGA GCG 1118 
Lys Lys Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala 
340 345 350 

CCC TTG CTA TCC GAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 1166 
Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
355 360 365 

TTG GGC CTG TCA GCC CGA ATG GGA ATA GAA GTG GTG ATG AGG CAA GTA 1214 
Leu Gly Leu Ser Ala Arg Met Gly He Glu Val Val Met Arg Gin Val 
370 375 380 

TTC TTT GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT 12 62 
Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro 
385 390 395 400 

TTA CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 1310 
Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe' Lys Lys Leu Val Gly Thr 
405 410 415 

AAG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT CGA 13 58 
Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu Arg 
420 425 . 430 

GTA TAC CTT CAT TGC ACA AAC ACT GAC AAT CCA AGG TAT AAA GAA GGA 14 06 
Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 
435 440 445 

GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC AAG TAC TTG 14 54 
Asp Leu Thr Leu Tyr Ala He Asn Leu His Asn Val Thr Lys Tyr Leu 
450 455 460 

CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT AAA TAC CTT CTA 1502 
Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 
465 470 475 480 

AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA TCT GTC CAA CTC AAT 1550 
Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys Ser Val Gin Leu Asn 
485 490 495 



GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA ACC TTG CCA CCT TTA ATG 1598 
Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr Leu Pro Pro Leu Met 
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GAA AAA CCT CTC CGG CCA GGA AGT TCA CTG GGC TTG CCA GCT TTC TCA 164 6 
Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Ser 
515 520 . 525 

TAT AGT TTT TTT GTG ATA AGA AAT GCC AAA GTT GCT GCT TGC ATC TGA 1694 
Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys lie 
530 535 540 543 

AAA TAA AAT ATA CTA GTC CTG ACA CTG 17 21 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 



CTGGCAAGAA 


GGTCTGGTTG 


GGAGAGACGA 


GCTCAGCTTA 


CGGTGGCGGT 


GCACCCTTGC 


60 


TGTCCAACAC 


CTTTGCAGCT 


GGCTTTATGT 


GGCTGGATAA 


ATTGGGCCTG 


TCAGCCCAGA 


120 


TGGGCATAGA 


AGTCGTGATG 


AGGCAGGTGT 


TCTTCGGAGC 


AGGCAACTAC 


CACTTAGTGG 


180 


ATGAAAACTT 


TGAGCCTTTA 


CCTGATTACT 


GGCTCTCTCT 


TCTGTTCAAG 


AAACTGGTAG 


240 


GTCCCAGGGT 


GTTACTGTCA 


AGAGTGAAAG 


GCCCAGACAG 


GAGCAAACTC 


CGAGTGTATC 


300 


TCCACTGCAC 


TAACGTCTAT 


CACCCACGAT 


ATCAGGAAGG 


AGATCTAACT 


CTGTATGTCC 


360 


TGAACCTCCA 


TAATGTCACC 


AAGCACTTGA 


AGGTACCGCC 


TCCGTTGTTC 


AGGAAACCAG 


420 


TGGATACGTA 


CCTTCTGAAG 


CCTTCGGGGC 


CGGATGGATT 


ACTTTCCAAA 


TCTGTCCAAC 


480 


TGAACGGTCA 


AATTCTGAAG 


ATGGTGGATG 


AGCAGACCCT 


GCCAGCTTTG 


ACAGAAAAAC 


540 


CTCTCCCCGC 


AGGAAGTGCA 


CTAAGCCTGC 


CTGCCTTTTC 


CTATGGTTTT 


TTTGTCATAA 


600 


GAAATG.CCAA 


AATCGCTGCT 


TGTATATGAA 


AATAAAAGGC 


ATACGGTACC 


CCTGAGACAA 


660 


AAGCCGAGGG 


GGGTGTTATT 


CATAAAACAA 


AACCCTAGTT 


TAGGAGGCCA 


CCTCCTTGCC 


720 


GAGTTCCAGA 


GCTTCGGGAG 


GGTGGGGTAC 


ACTTCAGTAT 


TACATTCAGT 


GTGGTGTTCT 


780 


CTCTAAGAAG 


AATACTGCAG 


GTGGTGACAG 


TTAATAGCAC 


TGTG 
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(2) INFORMATION FOR SEQ ~lD NO:13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 189S 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 



GGGAAAGCGA 


GCAAGGAAGT 


AGGAGAGAGC 


CGGGCAGGCG 


GGGCGGGGTT 


GGATTGGGAG 


60 


CAGTGGGAGG 


GATGCAGAAG 


AGGAGTGGGA 


GGGATGGAGG 


GCGCAGTGGG 


AGGGGTGAGG 


120 


AGGCGTAACG 


GGGCGGAGGA 


AAGGAGAAAA 


GGGCGCTGGG 


GCTCGGCGGG 


AGGAAGTGCT 


180 


AGAGCTCTCG 


ACTCTCCGCT 


GCGCGGCAGC 


TGGCGGGGGG 


AGCAGCCAGG 


TGAGCCCAAG 


240 


ATGCTGCTGC 


GCTCGAAGCC 


TGCGCTGCCG 


CCGCCGCTGA 


TGCTGCTGCT 


CCTGGGGCCG 


300 


CTGGGTCCCC 


TCTCCCCTGG 


CGCCCTGCCC 


CGACCTGCGC 


AAGCACAGGA 


CGTCGTGGAC 


360 


CTGGACTTCT 


TCACCCAGGA 


GCCGCTGCAC 


CTGGTGAGCC 


CCTCGTTCCT 


GTCCGTCACC 


420 


ATTGACGCCA 


ACCTGGCCAC 


GGACCCGCGG 


TTCCTCATCC 


TCCTGGGTTC 


TCCAAAGCTT 


480 


CGTACCTTGG 


CCAGAGGCTT 


GTCTCCTGCG 


TACCTGAGGT 


TTGGTGGCAC 


CAAGACAGAC 


540 


TTCCTAATTT 


TCGATCCCAA 


GAAGGAATCA 


ACCTTTGAAG 


AGAGAAGTTA 


CTGGCAATCT 


600 


CAAGTCAACC 


AGGATATTTG 


CAAATATGGA 


TCCATCCCTC 


CTGATGTGGA 


GGAGAAGTTA 


660 


CGGTTGGAAT 


GGCCCTACCA 


GGAGCAATTG 


CTACTCCGAG 


AACACTACCA 


GAAAAAGTTC 


720 


AAGAACAGCA 


CCTACTCAAG 


AAGCTCTGTA 


GATGTGCTAT 


ACACTTTTGC 


AAACTGCTCA 


780 


GGACTGGACT 


TGATCTT7GG 


CC7AAATGCG 


TTATTAAGAA 


CAGCAGATTT 


GCAGTGGAAC 


640 


AG77C7AA7G 


CTC AGT TGC T 


CCT GG ACT AC 


TGCTCTTCCA 


AG GG G T AT AA 


CATTTCTTGG 


900 



GAACTAGGCA ATGAACCTAA CAGTTTCCTT 
CAGTTAGGAG AAGATTATAT TCAATTGCAT 
GCAAAACTCT ATGGTCCTGA TGTTGGTCAG 
AGCTTCCTGA AGGCTGGTGG AGAAGTGATT 
AATGGACGGA CTGCTACCAG GGAAGATTTT 
TCATCTGTGC AAAAAGT T TT CCAGGTGGTT 
T T AGG AG AAA CAAGCTCTGC ATATGGAGGC 
GCTGGCTTTA TGTGGCTGGA TAAATTGGGC 
ATGAGGCAAG TATTCTTTGG AGCAGGAAAC 
TTACCTGATT ATTGGCTATC TCTTCTGTTC 
GCAAGCGTGC AAGGTTCAAA GAGAAGGAAG 
GACAATCCAA GGTATAAAGA AGGAGATTTA 
ACCAAGTACT TGCGGTTACC CTATCCTTTT 
AGACCTTTGG GACCTCATGG ATTACTTTCC 
AAGATGGTGG ATGATCAAAC CTTGCCACCT 
TCACTGGGCT TGCCAGCTTT CTCATATAGT 
GCTTGCATCT GAAAATAAAA TATACTAGTC 
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AAGAAGGCTG ATATTTTCAT CAATGGGTCG 960 

AAACTTCTAA GAAAGTCCAC CTTCAAAAAT 1020 

CCTCGAAGAA AGACGGCTAA GATGCTGAAG 1080 

GATTCAGTTA CATGGCATCA CTACTATTTG 114 0 

CTAAACCCTG ATGTATTGGA CATTTTTATT 1200 

GAGAGCACCA GGCCTGGCAA GAAGGTCTGG 1260 

GGAGCGCCCT TGCTATCCGA CACCTTTGCA 1320 

CTGTCAGCCC GAATGGGAAT AGAAGTGGTG 1380 

TACCATTTAG T G G AT G AAA A CTTCGATCCT 144 0 

AAGAAATTGG TGGGCACCAA GGTGTTAATG 1500 

CTTCGAGTAT ACCTTCATTG CACAAACACT 1560 

ACTCTGTATG CCATAAACCT CCATAACGTC 162 0 

TCTAACAAGC AAGTGGATAA ATACCTTCTA 1680 

AAATCTGTCC AACTCAATGG TCTAACTCTA 174 0 

TTAATGGAAA AACCTCTCCG GCCAGGAAGT 1800 

TTTTTTGTGA TAAGAAATGC CAAAGTTGCT 1860 

CTGACACTG 1899 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 592 

(B) TYPE: amino acid 

(C) STRANDEDNESS : singl 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 



Met Glu Gly Ala Val Gly Gly Val Arg Arg Arg Asn Gly Ala Glu 

5 10 15 

Glu Arg Arg Lys Gly Arg Trp Gly Ser Ala Gly Gly Ser Ala Arg 

20 25 30 

Ala Leu Asp Ser Pro Leu Arg Gly Ser Trp Arg Gly Glu Gin Pro 

35 40 45 

Gly Glu Pro Lys Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro 

50 55 60 

Pro Leu Met Leu Leu Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro 

65 70 75 

Gly Ala Leu Pro Arg Pro Ala Gin Ala Gin Asp Val Val Asp Leu 

80 85 90 

Asp Phe Phe Thr Gin Glu Pro Leu His Leu Val Ser Pro Ser Phe 

95 100 105 

Leu Ser Val Thr lie Asp Ala Asn Leu Ala Thr Asp Pro Arg Phe 

110 115 120 

Leu lie Leu Leu Gly Ser Pro Lys Leu Arg Thr Leu Ala Arg Gly 

125 130 135 

Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys Thr Asp Phe 

140 145 150 

Leu lie Phe Asp Pro Lys Lys Glu Ser Thr Phe Glu Glu Arg Ser 

155 160 165 

Tyr Trp Gin Ser Gin Val Asn Gin Asp lie Cys Lys Tyr Gly Ser 

170 175 180 

lie Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp Pro Tyr 

185 190 195 

Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe Lys 

200 205 210 

Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 

215 22G 225 

Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu 
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230 235 240 

Leu Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu 

245 250 255 

Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn lie Ser Trp Glu 

260 265 270 

Leu Gly Asn Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp lie Phe 

275 280 - 285 

lie Asn Gly Ser Gin Leu Gly Glu Asp Tyr lie Gin Leu His Lys 

290 295 300 

Leu Leu Arg Lys Ser Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro 

305 310 315 

Asp Val Gly Gin Pro Arg Arg Lys Thr Ala Lys Met Leu Lys Ser 

320 325 330 
Phe Leu Lys Ala Gly Gly Glu Val lie Asp Ser Val Thr Trp His 

335 340 345 

His Tyr Tyr Leu Asn Gly Arg Thr Ala Thr Arg Glu Asp Phe Leu 

350 355 360 

Asn Pro Asp Val Leu Asp lie Phe lie Ser Ser Val Gin Lys Val 

365 370 375 

Phe Gin Val Val Glu Ser Thr Arg Pro Gly Lys Lys Val Trp Leu 

380 385 390 

Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro Leu Leu Ser 

395 400 405 

Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys Leu Gly Leu 

410 415 420 
Ser Ala Arg Met Gly lie Glu Val Val Met Arg Gin Val Phe Phe 

425 430 435 
Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro Leu 

440 445 450 

Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 

455 460 465 

Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 

470 475 480 

Arg Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys 

485 490 495 

Glu Gly Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn Val Thr 

500 505 510 

Lys Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 

515 520 525 

Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 

530 535 540 

Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 

545 550 555 

Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 

560 565 570 

Leu Gly Leu Pro Ala Phe Ser Tyr Ser Phe Phe Val lie Arg Asn 

575 580 585 

Ala Lys Val Ala Ala Cys He 

590 592 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1899 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

( x i ) SEQUENCE DESCRIPTION: SEQ ID HO: 15 
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AAA 
TTG 
ATG 
Met 


GCG 
GAT 
GAG 
Glu 


AGC 
TGG 
GGC 
Gly 


AAG 
GAG 
GCA 
Ala 


GAA 
Glu 


AGG 
Arg 


AGA 
Arg 


AAA 
Lys 


GCT 
Ala 


CTC 
Leu 


GAC 
Asp 


TCT 
Ser 


GGT 
Gly 


GAG 
Glu 


CCC 
Pro 


AAG 
Lys 


CCG 
Pro 


CTG 
Leu 


ATG 
Met 


CTG 
Leu 


GGC 
Gly 


GCC 
Ala 


CTG 
Leu 


CCC 
Pro 


GAC 
Asp 


TTc 
Phe 


TTC 
Phe 


ACC 
Thr 


CTG 
Leu 


TCC 
Ser 


GTC 
Val 


ACC 
Thr 


CTC 
Leu 


ATC 
lie 


CTC 
Leu 


CTG 
Leu 


TTG 
Leu 


TCT 
Ser 


CCT 
Pro 


GCG 
Ala 


CTA 
Leu 


ATT 
He 


TTC 
Phe 


GAT 
Asp 


TAC 
Tyr 


TGG 
Trp 


CAA 
Gin 


TCT 
Ser 


ATC 
He 


CCT 
Pro 


CCT 
Pro 


GAT 
Asp 



GAA 


GTA 


GGA 


GAG 


CAG 


TGG 


GAG 


GGA 


GTG 


GGA 


GGG 


GTG 


Val 


Gly 


Gly 


Val 


5 








GGG 


CGC 


TGG 


GGC 


Gly 


Arg 


Trp 


Gly 


20 








CCG 


CTG 


CGC 


GGC 


Pro 


Leu 


Arg 


Gly 


35 








ATG 


CTG 


CTG 


CGC 


Met 


Leu 


Leu 


Arg 


50 








CTG 


CTC 


CTG 


GGG 


Leu 


Leu 


Leu 


Gly 


65 








CGA 


CCT 


GCG 


CAA 


Arg 


Pro 


Ala 


Gin 


80 








CAG 


GAG 


CCG 


CTG 


Gin 


Glu 


Pro 


Leu 


95 








ATT 


GAC 


GCC 


AAC 


He 


Asp 


Ala 


Asn 


110 








GGT 


TCT 


CCA 


AAG 


Gly 


Ser 


Pro 


Lys 


125 








TAC 


CTG 


AGG 


TTT 


Tyr 


Leu 


Arg 


Phe 


140 








CCC 


AAG 


AAG 


GAA 


Pro 


Lys 


Lys 


Glu 


155 








CAA 


GTC 


AAC 


CAG 


Gin 


Val 


Asn 


Gin 


170 








GTG 


GAG 


GAG 


AAG 


Val 


Glu 


Glu 


Lys 



185 



AGC 


CGG 


GCA 


GGC 


TGC 


AGA 


AGA 


GGA 


AGG 


AGG 


CGT 


AAC 


Arg 


Arg 


Arg 


Asn 




10 






TCG 


GCG 


GGA 


GGA 


Ser 


Ala 


Gly 


Gly 




25 






AGC 


TGG 


CGG 


GGG 


Ser 


Trp 


Arg 


Gly 




40 






TCG 


AAG 


CCT 


GCG 


Ser 


Lys 


Pro 


Ala 




55 






CCG 


CTG 


GGT 


CCC 


Pro 


Leu 


Gly 


Pro 




70 






GCA 


CAG 


GAC 


GTc 


Ala 


Gin 


Asp 


Val 




85 






CAC 


CTG 


GTG 


AGC 


His 


Leu 


Val 


Ser 




100 






CTG 


GCC 


ACG 


GAC 


Leu 


Ala 


Thr 


Asp 




115 






CTT 


CGT 


ACC 


TTG 


Leu 


Arg 


Thr 


Leu 




130 






GGT 


GGC 


ACC 


AAG 


Gly 


Gly 


Thr 


Lys 




145 






TCA 


ACC 


TTT 


GAA 


Ser 


Thr 


Phe 


Glu 




160 






GAT 


ATT 


TGC 


AAA 


Asp 


He 


Cys 


Lys 




175 






TTA 


CGG 


TTG 


GAA 


Leu 


Arg 


Leu 


Glu 




190 











GGG 


3 


GGG 


GCG 


GGG 


48 


GTG 


GGA 


GGG 


93 


GGG 


GCG 


GAG 


138 


Gly 


Ala 


Glu 








15 




AGT 


GCT 


AGA 


183 


Ser 


Ala 


Arg 








30 




GAG 


CAG 


CCA 


228 


Glu 


Gin 


Pro 








45 




CTG 


CCG 


CCG 


273 


Leu 


Pro 


Pro 








60 




CTc 


TCC 


CCT 


318 


Leu 


Ser 


Pro 








75 




GTG 


GAC 


CTG 


363 


Val 


Asp 


Leu 








90 




CCC 


TCG 


TTC 


408 


Pro 


Ser 


Phe 








105 




CCG 


CGG 


TTC 


453 


Pro 


Arg 


Phe 








120 




GCC 


AGA 


GGC 


498 


Ala 


Arg 


Gly 








135 




ACA 


GAC 


TTC 


543 


Thr 


Asp 


Phe 








150 




GAG 


AGA 


AGT 


588 


Glu 


Arg 


Ser 








165 




TAT 


GGA 


TCC 


633 


Tyr 


Gly 


Ser 








180 




TGG 


CCC 


TAC 


678 


Trp 


Pro 


Tyr 








195 
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Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys 
200 205 



Lys Phe Lys 
210 



AAC AGC ACC TAC 
Asn Ser Thr Tyr 

GCA AAC TGC TCA 
Ala Asn Cys Ser 

TTA AGA ACA GCA 
Leu Arg Thr Ala 

CTC CTG GAC TAC 
Leu Leu Asp Tyr 

CTA GGC AAT GAA 
Leu Gly Asn Glu 

ATC AAT GGG TCG 
lie Asn Gly Ser 

CTT CTA AGA AAG 
Leu Leu Arg Lys 

GAT GTT GGT CAG 
Asp Val Gly Gin 

TTC CTG AAG GCT 
Phe Leu Lys Ala 

CAC TAC TAT TTG 
His Tyr Tyr Leu 

AAC CCT GAT GTA 
Asn Pro Asp Val 



TTC CAG GTG GTT 

Phe Gin Val Val 



GGA GAA ACA AGC 
Gly Glu Thr Ser 

GAC ACC TTT GCA 
Asp Thr Fhe Ala 



TCA AGA AGC TCT 
Ser Arg Ser Ser 
215 

GGA CTG GAC TTG 
Gly Leu Asp Leu 
230 

GAT TTG CAG TGG 
Asp Leu Gin Trp 

245 

TGC TCT TCC AAG 
Cys Ser Ser Lys 
2 60 

CCT AAC AGT TTC 
Pro Asn Ser Phe 
275 

CAG TTA GGA GAA 
Gin Leu Gly Glu 
290 

TCC ACC TTC AAA 
Ser Thr Phe Lys 
305 

CCT CGA AGA AAG 
Pro Arg Arg Lys 
320 

GGT GGA GAA GTG 
Gly Gly Glu Val 
335 

AAT GGA CGG ACT 
Asn Gly Arg Thr 
350 

TTG GAC ATT TTT 
Leu Asp lie Phe 
365 

GAG AGC ACC AGG 
Glu Ser Thr Arg 
380 

TCT GCA TAT GGA 
Ser Ala Tyr Gly 
395 

GCT GGC TTT ATG 

Ala Gly Fhe Met 



GTA GAT GTG CTA 
Val Asp Val Leu 
220 

ATC TTT GGC CTA 
lie Phe Gly Leu 
235 

AAC AGT TCT AAT 
Asn Ser Ser Asn 
250 

GGG TAT AAC ATT 
Gly Tyr Asn lie 
265 

CTT AAG AAG GCT 
Leu Lys Lys Ala 
280 

GAT TAT ATT CAA 
Asp Tyr lie Gin 

295 

AAT GCA AAA CTC 
Asn Ala Lys Leu 
310 

ACG GCT AAG ATG 
Thr Ala Lys Met 
325 

ATT GAT TCA GTT 
He Asp Ser Val 
340 

GCT ACC AGG GAA 
Ala Thr Arg Glu 
355 

ATT TCA TCT GTG 
He Ser Ser Val 
370 

CCT GGC AAG AAG 
Pro Gly Lys Lys 
385 

GGC GGA GCG CCC 
Gly Gly Ala Pro 
400 

TGG CTG GAT AAA 
Trp Leu Asp Lys 



TAC ACT TTT 7 68 
Tyr Thr Phe 

225 

AAT GCG TTA 813 
Asn Ala Leu 
240 

GCT CAG TTG 858 
Ala Gin Leu 
255 

TCT TGG GAA 903 
Ser Trp Glu 
270 

GAT ATT TTC 94 8 
Asp He Phe 
285 

TTG CAT AAA 993 
Leu His Lys 
300 

TAT GGT CCT 1038 
Tyr Gly Pro 
315 

CTG AAG AGC 1083 
Leu Lys Ser 
330 

ACA TGG CAT 1128 
Thr Trp His 
345 

GAT TTT CTA 117 3 
Asp Phe Leu 
360 

CAA AAA GTT 1218 
Gin Lys Val 
375 

GTC TGG TTA 12 63 
Val Trp Leu 
390 

TTG CTA TCC 1308 
Leu Leu Ser 
405 

TTG GGC CTG 1353 
Leu Glv Leu 



410 



415 
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TCA GCC CGA ATG GGA ATA gAA GTG GTG ATG AGG CAA GTA TTC TTT 1398 
Ser Ala Arg Met Gly lie Glu Val Val Met Arg Gin Val Phe Phe 
425 430 435 

GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT TTA 144 3 
Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro Leu 
440 445 450 

CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 14 88 
Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 
455 460 465 

AAG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT 1533 
Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 
470 475 480 

CGA GTA TAC CTT CAT TGC AC A AAC ACT GAC AAT CCA AGG TAT AAA 1578 
Arg Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys 
485 490 495 

GAA GGA GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC 1623 
Glu Gly Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn Val Thr 
500 505 . 510 

AAG TAC TTG CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT 1668 
Lys Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 
515 520 525 

AAA TAC CTT CTA AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA 1713 
Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 
530 535 540 

TCT GTC CAA CTC AAT GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA 17 58 
Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 
545 550 555 

ACC TTG CCA CCT TTA ATG GAA AAA CCT CTC CGG CCA GGA AGT TCA 1803 
Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 
560 565 570 

CTG GGC TTG CCA GCT TTC TCA TAT AGT TTT TTT GTG ATA AGA AAT 184 8 
Leu Gly Leu Pro Ala Phe Ser Tyr Ser Phe Phe Val lie Arg Asn 
575 580 585 

GCC AAA GTT GCT GCT TGC ATC TGA AAA TAA AAT ATA CTA GTC CTG 18 93 
Ala Lys Val Ala Ala Cys lie 
590 592 

AC A CTG 18 99 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 594 

(E) TYPE: nucleic acid 

CO STPAMDEDNESS : double 

ID; TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 



ATT AC TAT AG GGCACGCGTG GTCGACGGCC 
TAAAGAATTT TGGGTGGTTG ATCTCTTTCC 
TTTTTTCAGG CAAAAGTAAA ATACCTGAGA 
GGCTGGCTCA AGTGACAAGC AAGTGTTTAT 
TCCATTGGAG GCTTTACTCG AGGGTCAGAG 
GGAGTCGGAA ACGCTGGGTT CCCACGAGAG 
TCCGGGATGC CCAGCGCTGC TCCCCGGGCG 
CCGGGCGCTT GGATCCCGGC CATCTCCGCA 
GTGAACGTGA CCGCCACCGG GGGGAAAGCG 
GGGGCGGGGT TGGATTGGGA GCAGTGGGAG 



CGGGCTGGTA TTGTCTTAAT GAGAAGTTGA 60 
AGCTGCAGTT TAGCGTATGC TGAGGCCAGA 120 
AACTGCCTGG CCAGAGGACA ATCAGATTTT 180 
AAGCTAGATG GGAGAGGAAG GGATGAATAC 2 40 
GGATACCCGG CGCCATCAGA ATGGGATCTG 300 
CGCGCAGAAC ACGTGCGTCA GGAAGCCTGG 3 60 
CTCCTCCCCG GGCGCTCCTC CCCAGGCCTC 4 20 
CCCTTCAAGT GGGTGTGGGT GATTTCGTAA 4 80 
AGCAAGGAAG TAGGAGAGAG CCGGGCAGGC 54 0 
GGATGCAGAA GAGGAGTGGG AGGG 594 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CCCCAGGAGC AGCAGCATCA G 21 



(2) INFORMATION FOR SEQ ID NO: 18: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 

AGGCTTCGAG CGCAGCAGCA T 21 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 9 

GTAATACGAC TCACTATAGG GC 22 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 

ACTATAGGGC ACGCGTGGT 19 

(2) INFORMATION FOR SEQ ID NO : 2 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
CTTGGGCTCA CC7GGC7GC7 C 21 
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(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
GCATCTTAGC CGTCTTTCTT CG 2 2 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
GAGCAGCCAG GTGAGCCCAA GAT 2 3 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
TTCGATCCCA AGAAGGAATC AAC 2 3 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 

TCAGATGCAA GCAGCAACTT TGGC 24 



(2) 



INFORMATION FCR SEQ ID MO: 26: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
GCATCTTAGC CGTCTTTCTT CG 22 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GTAGTGATGC CATGTAACTG AATC 24 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
AGGCACCCTA GAGATGTTCC AG 22 

(2) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID HO: 31 

GAAGATTTCT GTTTCCATGA CGTG 2 4 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
CCACACTGAA TGTAATACTG AAGTG 2 5 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CGAAGCTCTG GAACTCGGCA AG 22 



(2) INFORMATION FOR SEQ ID MO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(E) TYPE: nucleic acid 

{C; S7RAHDEDNESS : single 



(D) TOPOLOGY: 
(xi) SEQUENCE DESCRIPTION: 
GCCAGCTGCA AAGGTGTTGG AC 22 
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linear 
SEQ ID NO: 



34 



{2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: ■ linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
AACACCTGCC TCATCACGAC TTC 23 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 6 

GCCAGGCTGG CGTCGATGGT GA 22 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
GTCGATGGTG ATGGACAGGA AC 22 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 

GTAATACGAC TCACTATAGG GC 22 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
ACTATAGGGC ACGCGTGGT 19 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
CCATCCTAAT ACGACTCACT A7AGGGC 27 
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<2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: , linear . 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 

ACTCACTATA GGGCTCGAGC GGC 23 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 4 848 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

GGATCTTGGC TCACTGCAAT CTCTGCCTCC CATGCAATTC TTATGCATCA 50 
GCCTCCTGAG TAGCTTGGAT TATAGGTCTG CGCCACCACT CCTGGCTACA 100 
CCATGTTGCC CAGGCTGGTC TTGAACTCTT GGGCTCTAGT GATCCACCCG 150 
CCTTGGCCTC CCAAAGTGCT GGGATTACAG GTGTGAGCCA TCACACCCGG 200 
CCCCCCGTTT CCATATTAGT AACTCACATG TAGACCACAA GGATGCACTA 2 50 

TTTAGAAAAC TTGCAATGGT CCACTTTTCA AATCACCCAA ACATGTTAAA 300 
GAAATTGGTA TGACTGGGCA TGGCACAGTG GCTCATGCCT GCAATCCTAG 350 
CATTTTGTGA GGCTGAGACG GGCAGATCAC GAGGTCAGGA GATTGAGACC 4 00 

ATCCTGACAG ACATGGTGAA ATCCCATCTC TACTAAAAAT ACAAAACAAT 4 50 

TAGCCGGGGG TGATGGCAGG CCCCTGTAGT CCCAGCTACT CGGGAGGCTG 500 
AGGCAGGAGA ATGGCGTGAA TCCAGGAGGC AGAGCTTGCA GTGAGCCGAG 550 
ATGGTGCCAC TGCACTCCAG CCTGGGCGAC AGAGCGAGAC TCCGTCTCAA 600 
A AAA A A A AAA AAAGAAAGAA ATTGGTATGA CTGTTGACTC ACAACAGGAG 650 
TCAGGGGCAT GGGGTGGGGT GTAAGATTAA TGTCATGACA AATGTGGAAA 7 00 

AGAAACTTCT GTTTTTCCAA CTCCACGTCT GCTACCATAT TATTACACTC 7 50 

TTCTGGTAGT GTGGTGTTTA TGTGTGAATT TTTTTTCATA TGTATACAGT 800 
AATTGTAGGA TATGAACCTG ATTCTAGTTG CAAAACTCAC TATGAGCTTA 850 
GCTTTTAAGT TGCTTAAGAA TAGGTAGATC TATGCAAATA ATGATAATTA 900 
TTATTATTAT TTTAAGAGAG GGTCTCACTT TGTCACCCAG GCTGGAGTGC 950 

AGTGGTGTGA TTAAGGGTCA CTGCAACCTC CACCTCCCAG GCTCAAATAA 1000 

ACCTCCCACC TCAGCCTCCC CAGTAGCTGG AACCACAGGC ACGGGCCACC 10 50 

ACGCCTGGCT AATTTTTTGT ATTTTTTGTA GAGATGGGGT TTCATCATGT 1100 

TGCCCAGGCT GTTCTTGAAT TCCTCGGCTC AAGCAATCCT CCCACCTTGG 1150 

CCTCCCAAAA TGCTGGCATC ACAGGCATGA TGGCATCACT GGCATCACAT 1200 

ACCATGCCTG GCCTGATTTA TGCAAATTAG ATATGCATTT CAAAATAATC 12 50 

TATTTTTATT TGTTGCCTTA TTGGTGGTAC AATCTCAAGT GGAAAAATCT 1300 

AAGGGTTTTG GTGTTATTTG CTTACTCAAC CAATATTTAT TAGACTCTTA 1350 

CTAAGCACCA ACATGATCAC ATGCCTGAGC TATGGCTAGC ATAGCGTGTG 14 00 

AGACAAACTT AATCTCTGTT TTGGTGGAGC ATATAATCTA GTAGATGAAG 14 50 

CCAATGTTGA GCAACATCAC AATACTAACA AATTGAGGAT GCTACGAGAG 1500 

TGTCTAACAA ATTGAGGATG CTACGAGAGT GTCTAACAAA TTGAGGATGC 1550 

TATGAGAGTG TGTCATGGAG AGCTGCCTGG AGATTGAGAG AAAGCTTCCT 1600 

TGAGGGAAGT TACATTTCAG CTGAAACACA CTGCCATCTG CTCGAGGTTT 1650 

TGTAACTGCA TTCACATCCC GATTCTGACA CTTCACATCC CGATTCTGAC 17 00 

ACTTCACCCA GTTACTGTCT CAGAGCTTGG GTCCGCATGT G T A A A A C A AG 1750 

GACAGTATGC ACTTGGCAGG GTTGTGAGAA GGGAAGAGAA CACAAGTAAA 1800 

GCACCTGTAT CAGGCATACA GTAGGCACTA AGCGTGCGAT GCTTGCTATG 1850 

ATTATACATC AGTGTAAGCA TCAAGGAAAA GCTGAAGAAA AGTCTGACCA 1900 

AC AGCGAAAG ATAAATGCGC AGAGGAGAAA TTTGGCAAAG GCTCCAAATT 1950 

CAGGGGCAGT CCGTACTCTA CACTTTGTAT GGGGGCTTCA GGTCCTGAGT 2000 

TCCAGACATT GGAGCAACTA ACCCTTTAAG ATTGCTAAAT ATTGTCTTAA 2050 

TGAGAAGTTG ATAAAGAATT TTGGGTGGTT GATCTCTTTC CAGCTGCAGT 2100 

TTAGCGTATG CTGAGGCCAG ATTTTTTCAA GCAAAAGTAA AATACCTGAG 2150 

AAACTGCCTG GCCAGAGGAC AATCAGATTT TGGCTGGCTC AAGTGACAAG 22 00 

CAAGTGTTTA TAAGCTAGAT GGGAGAGGAA GGGATGAATA CTCCATTGGA 22 50 

GGTTTTACTC GAGGGTCAGA GGGATACCCG GCGCCATCAG AATGGGATCT 2 300 

GGGAGTCGGA AACGCTGGGT TCCCACGAGA GCGCGCAGAA CACGTGCGTC 2 3 50 

AGGAAGCCTG GTCCGGGATG CCCAGCGCTG CTCCCCGGGC GCTCCTCCCC 24 00 

GGGCGCTCCT CCCCAGGCCT CCCGGGCGCT TGGATCCCGG CCATCTCCGC 24 50 

ACCCTTCAAG TGGGTGTGGG TGATTTCGTA AGTGAACGTG ACCGCCACCG 2 500 

AGGGGAAAGC GAGCAAGGAA GTAGGAGAGA GCCGGGCAGG CGGGGCGGGG 2 550 

TTGGATTGGG AGCAGTGGGA GGGATGCAGA AGAGGAGTGG GAGGGATGGA 2600 

GGGCGCAG7G GGr~.GGGG~G A GGAGGCG7AA CGGGGCGGAG GAAAGGAGAA 2 6 50 

AAGGGCGCTG GGGCTCGGCG GGAGGAAG7G C7AGAGCTCT CGACTCTCCG 2 "00 

C7GCGCGGCA GCTGGCGGGG GGAGCAGCCA GGTGAGCCCA AGATGCTGCT 27 50 
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GCGCTCGAAG CCTGCGCTGC CGCCGCCGCT GATGCTGCTG CTCCTGGGGC 2800 

CGCTGGGTCC CCTCTCCCCT GGCGCCCTGC CCCGACCTGC GCAAGCACAG 2850 

GACGTCGTGG ACCTGGACTT CTTCACCCAG GAGCCGCTGC ACCTGGTGAG 2900 

CCCCTCGTTC CTGTCCGTCA CCATTGACGC CAACCTGGCC ACGGACCCGC 2950 

GGTTCCTCAT CCTCCTGGGG TAAGCGCCAG CCTCCTGGTC CTGTCCCCTT 3000 

TCCTGTCCTC CTGACACCTA TGTCTGCCCC GCCAGCGGCT CTCCTTCTTT 3050 

TGCGCGGAAA CAACTTCACA CCGGAACCTC CCCGCCTGTC TCTCCCCACC 3100 

CCACTTCCCG CCTCTCATTC TCCCTCTCCC TCCCTTACTC TCAGACCCCA 3150 

AACCGCTTTT TGGGGGGTAT CATTTAAAAA ATAGATTTAG GGGTTACAAG 3200 

TGCAGTTCTG TTCCATGGGT- ATATTGCATT GTGGTGGCAT CTGGGCTCTT 3250 

AGTGTAACTG TCACCCGAAT GTTGTACATT GTATCTAATA GGTAATTTCT 3300 

CATCCCTCAT CCCTCTCCCA CCCTCCCACC TTTTGGAGTC TCCAGTGTCT 3350 

ACTATTCCAC TAAGTCCATG TGTACACATT GTTTAGCGCC CACTCTAAAT 3 4 00 

GAGCCTTTTT GTTTCATTCA TTCTGTAAGT GTTGAATAGG CACCACCTAA " 3450 

GGTCAGGTAT AAGTGGAAAT TTGAAAAAGA AACTGCCCAC TTGCCCCAGT 3500 

ACTTCCCTAG CCAAGAGGAG GGAAACCAGG CAGGTGCACC TGAAGGCCTG 3550 

TGAGTGCTTG ATTTGCTGTG CAGTGTAGGA CAAGTAAGAT TGTGCATAGC 3600 

CTTCTGTATT TAAGACTGTG TTAGGAAGAT TTCTCTTTCT TTTCTTTTCT 3650 

TTTTCTTTTT TCTTTTCTTT TTTTTTTTTA GGCAGATGAA AAGGGCGTCA 37 00 

CAGAACAGGA ATAAAAATCT AAATATTCAA T AAA T GAG AC CTAGGAGACT 37 50 

ACTGCAGTGA CTTACAAAGT CCTAATAAAA AGATGTCTCT CCAAAATGGG 3800 

GCTGCAAAAT GTGGTGCTGC CTTATCAGCT CTAAGTTTTT TCCTTACCTG 3850 

AGAAAGAAGG AACCTGATGC AGGTTCAGGG CTCCTGCCCC ATGAATGCAG 3900 

GCTGACTCCA AGATGGGGAG CTACAGGGAC AATCCCAGGT CTTCTAGGCC 3950 

TCTTATTTAG GCCCTGGGAG CCTCCAGAGA TGGCCACATC TTGACCAGCC 4 000 

CAGATAGAGG GAAAGATCAC CATTATCTCA CCTCTGTGTC AAATACCTAG 4 050 

ATGCTGTCCT CCCTGAGCCC AC AC TAT AG T TGCCAGCGCT AATTTAATGG 4100 

GTAGTGTACT GGTTAAGAGA TGGACAGACC ATCCTGGCTT GACTCTCAGC 4150 

TCTGGCAAAG ATGAGTGACT TGGTTTTTCC ATATCTCTTG GCCACACCAA 4200 

CCTTGATTTC TTCAGCTGTA GAATGGAATT TCTCAAGCTT GCCTCAAGGA 4250 

TTATTGCCCG AGGATTTGAT GATATGGTAA GAGCTTCTCA GTGTTTGACC 4 300 

CATAGTAAGT GTTTGACGTT TCAAACGAAT TGTTTCTTTC TAGGACATGG 4350 

TGAGCATTTG GTAGCCATTC ACCGGTTTTC TGTTTCTTTG GATCATAGTT 4 4 00 

AACCTCTCCT TTTCCTTCTG GCACTACAAT TTTCTGGTGG GGAAGAATCC 4 4 50 

TTACTTTCTG CCCTTCCCCT TAAGGATAGG AAGCTGATAC TAGGCAGCAA 4500 

CTAGTTGGGG GATAGGAAGA TTGTTCCAGA GAAATGCTGA ACCATAGGGC 4550 

TCCAGATCAC AGGACCCCAG TCTTAGCTTG CTGGGGTGTG GGGTGGGGGG 4 600 

GGGCGGTTAC TGAACATGGG TATGAAGTAG ATGTCCATTT ACTGAAATGT 4 650 

GAGGACCTGA GGCCTCTTCT ATTGCTGTAG CCAGCATATT CCCCAACCTC 4700 

TCCCCAAGAA AGGACAGATG GGGGTTCCCC CCTGGAGTAA CAGGTCCAAA 4 7 50 

AGAAAAAACA TACAGTGGGA CTTCCAGGAT CTGGGCCTGA TCACCCAGCA 4 800 

GTCAAGCTCC CCGCAATTGA CTAACACCCC CCTAACACGT AGAAATTCCA 4 8 50 

ATCTGCAATT TAGTGAGGAT GATACCTTTA TTCTTCTTAA ATACATCTCT 4 900 

TCATTTCCCA GAGCACCCTT TTTTCCCCTC CTCTGCACCT TTTTGTTAAA 4 950 

GACTGGAGTA TAATGAAATA CCAAGAGAGC ATAACATGTG ATACATAAAA 5000 

CTTTTTTTCT GGTTTACAAA ACAGTTCATT CTTGTCCATA CGTGCTTCTC 5050 

TCCAAGGCTG GCTGCTGTCT GTTCCAGCCC GCTTCGCTTG GAGAGGCCAT 5100 

CTGCCATACC TGCTCCCCAG ACGCATCGAC AAGCACACCC AGAGTGTTAT 5150 

CTGCTAAGAC CTAAAAGAGG GAGGAACCCC CTCTCCTCAT CTAAGACCTA 52 00 

GCTTCTAAAT TAGAGTGTGA GGGTCCATCT CCCCAGGAGG GGCACAGGGC 5250 

CCAAACAGCC CAGCCATCTC AGAAGACAAC ACTAAGCTTT GTAGGGGTCC 5300 

ACAGTAGAGG AGAGTAAGAC GCCTGTTGTT TAATTTATTA CAGTTCCTCA 5350 

AAAGTGAAGA TGTGTGGGCG GGATGGCAAG AGCTGAGCAG ACGAAAGCTG 54 00 

AAGGAATAAG GAAAGAGAGG AGGACACAAA CAGCTGACAC TTCCTCAGTT 54 50 

CTTGTCATTT GCCTGGCCCT GTTCTAAGCA CCTTCTAGGT ATTAATCCAT 5500 

TTAGTCTTGG CTACAACACT GTGAGTAACT AGTTTTGTCA CCCCCATTTT 5550 

AAAAATGAAG AAAGTGAGGC TCAGGGAGGT TAAGTAACTT GGCCACAGTT 5600 

TGAAACTAGA CTCTGATCAC ATGAGATAAT AGTGCCCATA A A A A G G G AAA 5650 

GCAGATTATA TTTTTTAAAG GAAAGAGAGT AGGATATGGT AGAAAAAGAT 5700 

TGTTTGGAAA GGAATTGAGA GATTGATATA ATGAAAAGAA GCATTCACAT 57 5 0 

GAGAGTAACA GTATCAGGGC CCAAACCTTC ATCTAAGGTA CTTCAAAGAG 5800 

GCCTAAGCAA ACTTAGTCAC TGGCGTGGTT CTAGTCTCCA TGATGGCAAA 5850 

TACATTGTGT ACAGCCCAAC TCCACACAAA ACTTAAATAC CAATGATAGA 5900 

GCAATCTAAA ATTTGAAAGA AAAAATCTTT CAATTTGTCG TCTTCCCAGA 5950 

GGGACTTAAT CAAGAAACCA ATCAAAATAC TTCCTAAGCC TAACTGTGTG 6000 

CAGAACTCCA AAGAGAGCCC AGCCCTAAAT CAACACTGTC CAATGGAAAT 60 50 

ATAATATAAT GTGGGCCTCA TATGCAAGGT CATATGTAAT TTTAAATTTT 6100 

CTAGTAGCCA TATTAAAAAG GTAAAAAGAA ACAAGTGAAA TTAATTTTAA 6150 

TAATTTTATT TAGTTCAATA GATCCAAAAT GTTTTCTCAG CATGTAATCA 6200 

ATATAAAAAT ATTAATGAGG TATTTATTAT TCCTTTTCTC AAACCAAGTC 6250 

TATTCTATAA TCTGGCGTGT ATTATTTACA GCACTTCTCA GACTATATTT 6300 

CTTTCTTTCT TTTTTTTTTC CGAGACAATT TTGCTCTTGT CACCCAAGCT 6350 

AGAGTACAAT GGCGTTACCT CGGCTCACTG CAACCTCCGC CTCCCGGGTT 64 00 

CAAGTTATTC TCCTGCCTCA GTCTCCCAAG TAGCTGGGAC TAGAGGCATG 64 50 

CACCACCACG CCTGGCTAAT TGTGTATTTT TAGTAGAGAC AGGGTTTCAC 6500 

CATGTTGGCC AGGCTAATCT CAAACTCCTG AGCTCAGGTG ATATGCCCAC 6550 

CTCGGCCTCC CAAAGTGTTG GGATTACAGG CGTGAGCCAC TGCACCCGGC 6600 

CTCAGATTAA CTATATTTCA AGCGTTCAGT AGCCACATGT AGCTAC-TGCT 6 650 

ATGGTAGTGG AC AG T AC AG A TCTGCATT7C AATTAAGACA CGTATACAA.G 6" GO 

CA7AG77CAC TAA7GC AC GG TAAAAAAAA.G 7A7AG7GC7G AG7CG7-7GG7 67 5 C 



146 

AGAAATCCTA AATACTGCAG AGCAAAAGTG GTACGAACAG CAATCTCAGT 6800 

GATAATGCAA CCATGCTTGC TTTTCATTGC AATTTGCTTA TTTTCCTTCA 68 50 

GCAAAGTTCA TCCATTTTTG CCAATTCAAT AAATATTTAC TGATAAAAAC 6900 

TTTCAATATT AGATTCTTGC ATCTTCATAG ACAGAGTTGC TTTTCACATT 6950 

TAGAAAATTA CTTATCAATG TTAAACACAC GTTTTGATAA CCAGTGTTGG 7000 

AAAGAGGTGC AGACTCCCCA TGTGCCTATT GATGGCAGAA ATATTCACAG 7 050 

CCAAAGGGAA ACAAAGGGCT GGGGACAATC ACACACCTCA TGTCTCCTAA 7100 

CTCCTGGGAA GTGCTGTCCC TCTGATTGAG CTCTTATTAT TGCCTTCCCC 7150 

ACTAACCCTG TCCACTGTGC CCTGGAGCCC TTTGCAGGGT TACCTGCTCT 7200 

GTCCTCCTCA CAGAATATCT CCTCTACCTC CTTGTCCAAG CTACAACTTG 7 250 

GCTATTCTCT GATGACACTG TCTTCCCTGT AGCCCTTTTG AGTAATGGCT 7300 

GCATATTCTC CCATAGTCCA GTTCTTTTCC TGTTCTCCAG TCTGGCTTCT 7350 

GGATGACAGC CCACTAGTTT GAACTCCATA CTGCTATAGT TCAAGTCCCT 7 4 00 

TTTGACTTGT TACCTTGGGC AAATTACCTC CTTTTGTTCA GGTTCCTTGT 7 4 50 

TTGTAAAATG ACGATAATAA TGCCATTTGC TTCAGTGGGT TATTTTGAAA 7500 

TTGAGTGAAA GAAGGCGGGT AGCTTCCCTA CACGCTCAGT GTAGACTAGC 7 550 

CTGATGTGCA TTACGGGTGA TGCCATGACT CAGTGTGTTT TCCTCATCTC 7 600 

CACATCTGGC TCTCATCCAG TGCTCCTGCT TACGGCACTC TGTCCCCCTC 7 650 

TTACTTACTC CCCCTTATTA ACTGAAGACT GGCACTGATC TCACAGTTTC 7 700 

CTCTCCACTT CCTAGTCTCA CCATCATCCT AGATGACTTC AAGTCACCTA 7 7 50 

GATAAACTGT CTCAGTTTCT TCACTCACAT TTTTTTATAA CAGATAATGT 7 800 

TACACTCAAG TTGTAACAGA ACCAGCTTAT CCAGCTCATG AAATGTATGC 7850 

ATTTCATCTC AACTCTGTAT TCAGTGACAT CCTGTGGGTA TCTGGAAATC 7 900 

AGCCATGGTG AGAATATTTA CCATGGAAAT TGGCAAATAC TAAAAAGCAG 7 950 

AGCACCTTTT TTTCTGAGAG CCAGACCATA GCTCTTCTAC TCCATAGCAC 8000 

CCATCATAAC AATTTTTAAA TACCTCCACT GAACAGCTTC TTCCTCTCTC 8050 

TACTTCTTCC ATATCTGATT TGAGCTTCTT AATTTATCAT GTGAACCACT 8100 

CTTGTAATAA TAACCCCAAA TCCCTGTTCC ATTGTTCTTC CTGCTAAAAT 8150 

ACTAAACCTG GTTTAGTCCA ACCATATTTT CTCTCTTTGG AATCTACAGG 8200 

GTGGCCCAAA AACCTGGAAA TGGAAAAATA TTACTTATTA ATTTTAATGT 8250 

ATATTAATAA GCCATTTTAA TGCTTCATTT CCAGTCTCAG TGGCCACCCT 8 300 

GTATAGCTGG GCTATTGAGC TCTTGCGGGA GGAGGGAGTG GACAGTCTCC 8350 

CAGCCACACA GACTGATGTT GCACCAAACA TTTTTTAGCT TCCAGACTTC 8 400 

CCTGGCCCTT AGTGTTACCC TTAACTCTCC ATTTCTCTGC CTTTCACATT 8 4 50 

CTCTACTTTT TAAAAATCTC TGACTCCACC TTCACCTTAT CATTCTTAGC 8 500 

ACATGACCAT ACTTCTGCTT CCCAAAGAAA ATGAGCAATT ACTTCCTTTT 8 550 

CCTTTTCCTC CTGTCATCAA ATCTGCAGAC ATGTCATGCC TAAGTCCAGC 8 600 
TTTCCTCCTT TCTCTGATCT CAGTCTGCTT CTTCCATTTC TGCCCTGAAT 8 650 
CCCGTCCCCT CCCCAACCCC CAAGGACTTC GCTCTATCAG TCACCTCTTC 87 00 
CCTCTCCTGT ATCTTCAACT CCTCCCATTT TACTGGCTTC TTCCTCAAGC 87 50 
CTTTCCCCAA GCCTTTCCCA TCTCAATTAC CTCCTCGCAC ATGCCTCTGC 8 800 
AGAAACCACC CCGTTTCTTC CCTCCCCTCG GCAGCCTGTT CTTCCTGTTC 8850 
TGCCCTCATG ATGGCACCAT CATTGTGTCA CTAAAATCAA TCTCTCCGAC 8 900 
ATCATCAATG GCCTTCCTTT GTTGGGAAAC CTAATAAACA CTTTATCTTA 8 950 
TTTGGTCTTT GTTATGGGTT GAATGAGGTT ACCCCGAAAT CCATATTAGA 9000 
AGTCCTAACC CCCAGTACCT CAGAATGTGA CTTTATTTGG GAATAGGGTC 9050 
ATTGCAGACG TTATTAGTTA GGATGAGGTC ATACTGGAAT GTGATGGGCT 9100 
GCTTATCTAA TATGACTGAT GTCCTTATAA CAAGGAGAAA TTTGGAGACA 9150 
GACACGCACA TAGGGAGAAT ACCATGTGAT GACAGGAGTT ATGGAGTTGG 9200 
AGTCAAAAAG CTATGGGAAC TTAGGAGAAA GACCTGGAAC AAATCCTTTC 92 50 
CTGCGCCTAG AGAGGGAGTA TGGCCCTGCC ACTACCTTGA ATTCAACGTT 9300 
TCGGCTTTTC AAAACTGTAA GACAATACAT TTCTGTTGTT CAAACCAATT 935 0 
AGTTTGCAGT ACTCTGCGAC TGCAGCCCTA ACAAACTAAT ACAGTCTCTT 94 00 
GGAGGCATTT GGCAAGGTTG ACAATGGAAG CACTTTCTTA CCCCTTTAGG 94 50 
TCTGTCGCCT TTCTTGTTGG GGGGTGTTTT CTAACAATTC CTCTCCATCT 9500 
CTCTCTCTCT AGTTTGTCTT AAACATTGGT GTTCTTCAGA CTTCTGACCT 95 50 
AGGCCTTCTT TTCACTTCAC ATATTCCCCT GGGTGGTCTC ACCCACTTCC 9 600 
AGAAATTACT TAAATTACTG CTCATGCAGT ACTGTGCTGG AAACTGTTTA 9650 
ACAACTGGCT CTCTGGGAAG AGGGGAGACT GGTTGATGGT TTTTGCTGAT 97 00 
TTCTGTGGTG TAAATACTCC CTCCATGGCC AATTCCAAAC TGCCAACAGT 97 50 
TTAACAACTG GCTCACAAAT TTTCTCCAAA TTTAACATTT GGCTTTCACA 9800 
GGCCAACAAC GTGGTACAGC CAACTCCAGC ACACCTCTGC TTTTGTGTCA 98 50 
GAGAGAAGTA ACTTATTTTT GTACAAAAGG TAAAATAAAA ACACCTGCAG 9900 
GCCCCCTTTT TTTCCTTAAC AAACTGCTCT AG A A AT AG A A TAGCTGAAGC 9950 

TTCTTTTATG CATTCATCTG TTATTTCCAT GTCACTGTGG TGGTGGGATT 10000 

ATTTTTCCTT TATTTTTCTT GTATATGGTT GAAATACTGT ACCTTTGATC 100 50 

AGTTTTAGTT TTATGGCATG TTTTGCACCC ATATTAAATC TAGTTTTTGT 10100 

CAGAGGGCGT CAATATTATT TTCTCAAAAC AAGAAAATAT TTCATTGCAA 10150 

AGGAGACAA.A CAAAAAGGTC CTTAATACCA AAACTTTGAA ATGTGATTTC 102 00 

TTGTACTTGG CAGTGTCCAA GTGGTAAACC CAAACAGTAT TGGGTTTTCA 10250 

TTTTGTTCAG GAAAGTCTTT GTCTGGCAGC GACTTACCCT TACATCAGGC 10300 

GGGCCTTGCT CATTCATTCA CTTAAGTATT TATTAAACAC CAGCGGTGTG 10350 

CCAAGTACTT ATCTAGGTAT CGGGTAGATT CTGATAAGTC AGTCAGGTCC 104 00 

CTGCTCTCAG GGAGCTTGCA GCAGAGATGG GGGCTGCAAT AGAGAGTAAG 104 50 

CCAAGGAAAT G A A A A A G G A A GTTGATTTCA GAGAGTGATG AATGCTATGA 10 500 

AGAAAATGAA GGCAGCGCAG TGTGATGGAG AGTGACCCAA GGTGGTACAG 105 50 

TTTGTACCTC TAAGGACCAG ACTGTGACCC AGGTCACTCA CAGATGCCCG 10600 

TCATGTGATG CCACAGCAAC TTTTCCAGGT GCTCGTTTCC TCCCACTTCC 10650 

CAG7CTC77G CCCAGCCGCG ACTGG77A7A AA7ACAGC7A GAGGAA7C7A 10" 00 

AATGAGGTTC CTCTA7CATC AAACCCAATC AAAATGCCAA GGAACAGAAT 107 50 



CAGTGCCTGG 
CTGAGGAAGT 
GCTAGGGGTT 
ATTCCCGGCC 
CCTGATGGAT 
TACAAAACTT 
ATCACCTGTT 
CCCAGGAGTA 
TAGGCCAGGT 
GAGGCAGGTG 
AAGGTGAAAC 
ATTATTTCTA 
CTGAGCTGTT 
ACCAGAGTCT 
CTTCCATGGC 
TGGCTTACGG 
TCTCTCCCAG 
GCTCGTCAAT 
TTTCCCTTCA 
CTAGATAAGG 
TGAAAAACCT 
AGGAGTCCCT 
CTTGTTCGTA 
CTCTGCCTGG 
AGTCAGCCTG 
TCCTTAAGTA 
AGGTGTGGTG 
GGCGGATCAC 
AAACCCCATC 
TGCGTATAGT 
CCCAGGAGGC 
CCTGGATGAC 
CACACACACA 
TAACGTGCTT 
GTCTACCATC 
ATTCTCTCCT 
TGAGTACTAT 
CTTGCCATTT 
TGTGCAAATA 
TGAAACTCTG 
TACAATTTAT 
TCCTTACTTA 
GGATGCCATG 
TATTTCAGTC 
ATTCTGAGGT 
GTTACAAACA 
AGACTTACAT 
AAGAAATACT 
TGAGCACATC 
ATATTAAGAG 
ATCCCAACAC 
GTGGAGGCTG 
ATGAGAATGA 
GAAGAAGTAT 
AAAXCTACAT 
TGGTTTAATA 
CATGTTTGAT 
TTCTGTTTTC 
TTTTGAGGGA 
GTACCATTCT 
ACAGAATATG 
TTAGAATACT 
TTTTCCTTGG 
AGAGATCCTA 
AC T T T TT AAA 
TT A A AT AG AC 
ATGCAAGGAC 
CCTCCCTCAT 
GAACCTACAC 
GGTTCACTGT 
TGTATCCACC 
CCCCTGTTCT 
CCCCTGGTAA 
TTTTTCAGAC 
TTCTTTCTCC 
AACACCATCC 
TGGCTCATGC 
ACTTGAAGCC 
TCCCTCCACA 



CTGAAGGCAG 
TCCTCATCTT 
GCCAGTCCCT 
TGTATGTGTC 
GGAAGTATGT 
AGTGCCCCTT 
CCTCATCCAG 
ATCCTTGACT 
ACAATAGCTC 
GATCATTTGA 
CTGTCTCATT 
CCTCTAAGTG 
ACCTTACCTC 
TGTTCTTAGT 
TCACCGTTGC 
GGCCCTCCGT 
CCTCTCTGCC 
GGTGCCAGCT 
CCTGGAATGC 
TTTATTCTTT 
TCTCTAACCA 
CTGAATGTTT 
TCTATCACTA 
TTCACCATTC 
CAACAAATAT 
AATCTTGCTT 
GCCCATGCCT 
CTGAGGTCAG 
TCTAATAAAA 
CCCAGCTACT 
AGAGGTGGCA 
AGAGACCCTG 
CACACACACA 
GTTATGGAAC 
TAGCTCACCA 
GTATATAAAT 
TTATTTATTT 
TAAGGTATGC 
TCACCACTAT 
TACCCATTAA 
TTTTATTTGG 
GTTCAGATTA 
ACAGATGCCA 
AGGGTGAATT 
TCCTAGTGTC 
AAAATATGGT 
TTTGTTCAAT 
TTGAAAATTT 
TTAAAACTTT 
GTACTGGCAA 
TTTGGGAGGG 
CAATGGCCTG 
AATCCTGTCT 
TGGCAATCAG 
GTAGACAAAC 
TGTTTTCAGA 
ACTTCAGAAA 
TTAGTTTTGC 
AGGGATTATA 
AAGCACATGA 
CATTTAGAAT 
ATATAATTCT 
TGGGGAATGG 
AGTCATTTAT 
AAGTCATTCA 
TTTATTTTTT 
AGAGATTTCC 
TATCAACATC 
TGACACATCA 
CGGTGTACAT 
ATTATAGTAA 
CCACCTATTC 
CCGCTGATCT 
AGACACAGAG 
C CATC CATC A 
TACTTGTCAA 
CTGTAATCTC 
AGAAGTTTGA 
CACAAACACA 



TGGAACAGGG 

GGTTTTAGGG 

GACATTTCTA 

TCCTGAGTTC 

TTTTTGGTGT 

CTCCTCCCTG 

CAAATGATAT 

CCTCCTCAAC 

ACGCCTATAA 

GGCCAGGAGT 

TAAAAAAAGT 

TGTCTTGAAT 

AGTCCATCAC 

CTGGTGAGGT 

CCTCATATAA 

GATGTGGCCC 

CCCATCTCTA 

TCTCTTCTAT 

TTTCTTCAAT 

TTGAATGTCT 

ACCCCCTACC 

CCATAGCATT 

AAC T AC AAA T 

ATCTCCAGCA 

TTGTTGAATA 

TTTTCACCTA 

GTAATCCCAG 

GAGTTCAAGA 

ATACAAAAAT 

AGGGAGGCTG 

GTGAGCCGAG 

TCTCAAAACA 

CACACACACC 

ACTTGTAAAA 

CATAATGACC 

ATATATTCTT 

TACTGTGGCA 

AGTTTGGTGC 

CTATCTCAGA 

ACAATAGTGC 

GTTTGTACCA 

GCATTTCCAT 

TCCTTCCTAG 

CGGGTTGATA 

AGAGCCCACC 

GAGGAGGAAT 

TTCTATCTTT 

TACATAGCAT 

AAATTTTAGA 

TTTGGCCAGG 

TGAAGTGGGC 

AGATCACGCC 

CAAAAAAAAA 

TGCTCCAGGA 

TAATTAGGCC 

GCATTCCAGG ■ 

TGTATGACAG 

TCATGTAAAT 

GATCATTCTA 

TAGGCACCCA 

TGTTCAAATT 

AAGTCATTTG 

TGAAGGGAGG 

AAACTTCTCT 

CCAAATTGTG 

AGAGCAGTTT 

CATAAACCCC 

CCCACCAGAG 

TTATCACCCA 

TCTATGGGTT 

CATACAGAGT 

ATCCCTCCCT 

TTTTACTGTC 

CTGTCTTTCC 

TAAAAGGCTA 

GTTAAAACAT 

AGCATTTTGG 

GACCAGCCTG 

C AC AC AC AC A 



CCAGCCTGGA 
CCATACCTTG 
CTGAGGACTC 
CAGACACACA 
TCCATTGGTA 
TTCCTCCCCA 
TACCATCTTC 
ATCCAATTAA 
TCCCAGCACT 
TCAAGACCAG 
TATTTTAAAA 
TTATCCATCT 
GTTTTGTCTA 
CACTCCAGCT 
AGTTGGCACT 
TATTTGCTTC 
GGCACCAACC 
CTCTGGTCTT 
CCTACCCCAC 
AGCAGTGAAA 
CTCAGCCCAA 
TTTAAAGAAT 
TGTATGAGAA 
ACTAGCATAA 
AATTAACAGA 
TTAAAACAGA 
CACTTTGGCA 
CCAGCCTGGC 
TAGCTGGGCA 
AGGCAAGAGA 
ATCATGCCAC 
CACACACACA 
AAGTTGTATA 
TACAGGAAAG 
ATTGCTATCA 
TTATTGTTAA 
AAATGCGCAA 
ATTCACCACA 
ACTTCTTCGT 
ATCCTCTGTT 
AAC TG AAA AT 
TTATTTAGCC 
AGCTCTTTGG 
ACATTTTAAA 
GTATTTTTAG 
CACTGAAGTT 
TAGTTTATTT 
TATACATATT 
TCAGATCTTT 
TGTGGTGGTT 
GAATTGCTAG 
ATCGTACTCC 
AAAAAAAAAA 
ATAATTTCCT 
ATTCCAAGAG 
AAGCAGTGTG 
GTGTTTCTCT 
ATTTATGAAC 
ATTCCATTTT 
TTTGGAGCAT 
AGAGGTGTCA 
ACTTAAATAC 
CAGGAGTTAA 
GGAAAGACAG 
TGTGTGTGTG 
TAGGTTCACA 
CTGCCCACAC 
AGGTGTTTGT 
AAGTCCATAG 
TGAGCAAATG 
ATTTTCAGTG 
CTCTGCATTT 
CCATAGTTTC 
CTTAGTTTCT 
TGAGTTTTTT 
AAGCTCCTGG 
GAGGCTGTGG 
GGCAACATAG 
C A C A C A C A C A 



GTGGTTCTCT 
TGACCTGTGA 
GCCTGTCTAT 
GGGCGAAGCG 
TCTCAAATTC 
TCTTCAGTCT 
CAAGGAGCTT 
TAATCAAATC 
TTGGGAGGCT 
CCTGGCCAAC 
ACTCAAATCT 
CTCTCCATCT 
CGTTAACATG 
GCTTCAGATC 
CCTGGACATG 
TCCATTCTGT 
ACACCCTTCT 
TGGACAGACT 
TCTCTTTAAT 
CCATTTCCCC 
GGTCTAGATT 
TGCCTATTTA 
CAGCCACTAT 
TGCCTGGCAG 
TGGCTTTATC 
CGCACAGGCC 
GGCTGAGGTG 
CAACATGGTG 
TGGTGGTGGG 
ATCGCTTGAA 
TGTACTCCAG 
CACACACACA 
ATTTAAAATA 
TAATGAAAAA 
TCCTGGCATA 
AATTACACTA 
AACATAAAAT 
CTCACATTGT 
CTTCCCAAAC 
TTCCCCTCCC 
AGCTGCTTCT 
GTGGTTTTGA 
GGCTGTCAGG 
ATCTCACTTT 
GGACTCCCAA 
TTAACACAAG 
CCTAAGCATA 
TAATTAAGCA 
AATTCCTAGG 
CACGCCTATA 
AGCCCAGGAG 
AGCCTGGATG 
AAAAGAAGAA 
GACTTGAAAT 
TTGCTAGCAT 
GCCAGCATTG 
TACCCAGGTC 
ATCCTCATCT 
CTAGCATTTG 
TTTTGGCTTG 
GTGATGGGAA 
AAAAGAATGA 
GAAGAGGAGA 
GTGTGTGAAG 
TGTGTGTGTT 
GCAAAATTGA 
ACATGCATAG 
TCTAGTTGAT 
TTCACGGCAG 
TATAATGACA 
CCCTGCAAAT 
CCACCCCCAG 
GGACGATCTA 
ATTCTATCAT 
TTAAGTGTTG 
CTGGGTACAG 
CAGAAGCATC 
CAAGACCCCA 
CACACACACA 



10800 
10850 
10900 
10950 
11000 
11050 
11100 
11150 
11200 
11250 
11300 
11350 
11400 
11450 
11500 
11550 
11600 
11650 
11700 
11750 
11800 
11850 
11900 
11950 
12000 
12050 
12100 
12150 
12200 
12250 
12300 
12350 
12400 
12450 
12500 
12550 
12600 
12650 
12700 
12750 
12800 
12850 
12900 
12950 
13000 
13050 
13100 
13150 
13200 
13250 
13300 
13350 
13400 
13450 
13500 
13550 
13600 
13650 
13700 
13750 
13800 
13850 
13900 
13950 
14000 
14050 
14100 
14150 
14200 
14250 
14300 
14350 
14400 
14450 
14500 
14550 
14600 
14650 
14700 



148 

CCCTCAGGTT CCTAGAAGAT CAGTCCTTCA ATTAGATTCA GATTGAGATG 14800 

CTTCCTCTTT TAAACAATGA TTCCCTTTCT ATCATGCCCA ATAAGAAAAC 14850 

AAA T A AAA AT TAAACAATAC TGCCTGTAAT CTCAGCTACC CAGGAGGCAG 14 900 

AAGCAGAACT GCTTCAACCC GGCAAGCAGA AGTTGCAGTG AAGTGAGATC 14 950 

GCGCCACTGC ACTCCAGCCT GGGAAACAGA GCAAGATTCT GTCTCAAAAA 15000 

CAAAACAATG TGATTTCCTC CTCTAAGTCC TGCACAGGGA AATGTTAAGA 15050 

AATAGGTCCA CCAGGAAAGA AGGAAGTAAG AATGTTTGAC TAGATTGTCT 15100 

TGGAAAAAAT AGTTATACTT TCTTGCTTGT CTTCCTAACA GTTCTCCAAA 15150 

GCTTCGTACC TTGGCCAGAG GCTTGTCTCC TGCGTACCTG AGGTTTGGTG 15200 

GCACCAAGAC AGACTTCCTA ATTTTCGATC CCAAGAAGGA ATCAACCTTT 152 50 

G A AG AG AG A A GTTACTGGCA ATCTCAAGTC AACCAGGGTG AAAATTTTTA 15300 

AAGATTCACT CTATATTTTA ATTAACGTCA GTCCGTCATG AGAATGCTTT 15350 

GAGAAAACTG TTATTTCTCA CACCTAACAA TTAATGAGAT TAACTTCCTC 15400 

TCCCCTCATC TGACCTGTGG AGGAATCTGA ACAAGAGGAG GAGGCAGTGG 154 50 

GCAGGTTTCC TTATCATGAT GTTTGTCATG TTCAGTGTGA GGCCTCACAA 15500 

AAAAAAAAAA AAAAAAAAAA GGCGTCCTGG ATATAACTGA GAGCTCATTG 15550 

TACAGTAAAT ATTAATAAAA CAGTGATTGT AGCTGAAGGA TAGAACTGCT 15 600 

TGGAGGGAGC AAGTGGGTAG AATCGCGTCA AACTAAAGAG CATTTCTAGC 15650 

CAAAGACACA ATGATAGATT GAAGGATATT TATTCTAAAT ATAGAATATG 15700 

GGTGAACGAG ATCTGTGGAC TTCTGGGCTC CAACGTTAGA TTCTGATTTT 15750 

AGCAAGCTTG TCAGGGGATT CTGATATTGA AAGGCTGTGG CCTTCACCTG 15800 

AGAAACCTGC CCTAGGGGGC CATGAAAATT TGTCCTGTCT TTCAGAAGTG 15850 

CTATCAGACA TCAAATGGAA GTTAAATCGT ATCTTAACAA TTACTAGGAT 15900 

GGGCGCAGTG ACTCACACCT GTAATCCCAA CACTTTGGGA GGCTGAGGCA 15950 

GGAGGATCAC TTGAGCCCAG GAGTTCGGGA CCAGCCTGGG CAACATAGAG 16000 

AGACGTTGTC TCTATTTTTT AATAATTTAA AG AG AAA AAA ATACTGAAAA 16050 

TATTGTATAC ACCACTGAAT TATAATAATG TGTATATAAT GTATATATTC 16100 

ATTATGAGGA ATATTTGATT ATTTCATATA TTATATCTTT TCCTTCTGTT 16150 

TATTTTATCC AGTTATGAAG TATTTAGAAC AATTCATCAG TAATTGGGGC 16200 

TAAATTGACA GAATAGTAAT CAGAGAAAAT AGAAAAAGAC AGATGGGTTA 16250 

TCTTTGAATA CCAGGTTGGA GTTGTTTATG GGTTTGTTTT TTGTTTTGGG 16300 

GGCGTTTTTT TAGACAGAGT CCCACTCTGT TGCCCAGGCT GGAGTGCAGT 16350 

GGCACAAGCA TGGCCCACTG CATCCTTGAC CTCTTGGGCT CAAGCAATCT 164 00 

TCCCACCTTA GCCTCCTGAG TAGCTGGGAC CACAGGTGCA TGTCACCACA 164 50 

CCCAGCTAAT TTTTTTATTT TTTGTAGAGA CAGTCTTTCT ATGTTATCCA 16500 

GGCTGATCTC AAACTCCTGC ACTCAAGTGA TCCCCCTGCC TTGGCGTCCC 165 50 

AAAGTATTGG GATTATAGGC ATAGCCACCA CACCCAACCT AGTTTCTATT 16600 

TAGACTTGGC CCTTTCCCAC CAGTCATTTG TGTCCAAAAG ATCTCATAAA 16650 

TGTAGACAGG AAACTGTCCT TTGCTCATCA GTTTTCTTCA TCCTGTGTCT 16700 

AGGGGGATGG TCGGTGGGGG AAACTGGGGT TATGCAAGTT CCTCTGAAAC 16750 

ATCCTCTGTG AGCCCAGGGA TGGATGAGGC ACCAGCCGCC AGCGAGTCAG 16800 

TGTGCAGCTT TCCAGAAAGG AAGTCATCAG CCAGTCAGCC GGCCCTGGCA 16850 

GCCAGCACCC GGCAACCCTG CTGTCTTGTG ATAAAGAAAT GGTCTGCCTG 16900 

ACAGGATGGT GTGGATTTTT CTTTTTTCTT TTTTTTTTTT TTGAGACAGG 16950 

GTCTGGCTCT GTCGCCCAGG CTGGAGTGCA ATGGCGGGAT CTTGGCTCAC 17000 

TGCAGCCTCT GCCTCCCAGG CTCAAGGCAT CCTCCCACCT CGGTCTCCCG 17 050 

AGTAGCTGGG ACCACAGGCA CACACCACCA CGCCCAACTA AGTTTTCGTA 17100 

TTTTTAGTAG AGGCAGGGTT TTACTATGTT GTCCAGGCTA GTCTCAAACT 17150 

CCTGAGCTCA AGCTATCCAT CTGCCTTGGC CTCCCAAAGA GCTGGAATTA 17200 

CAAGCGTGAG CCACTGTGCC TGACCAGGGT GGATTTTTTC AAGTGCACAT 17250 

GTTGTGGTCC CAGAAGCTCT GATGGTACCA AATTCCAAGC G A A A AAA A. G T 17 300 

CAATGGTTCC CACCCATCCT ACCTCCCATG ATGGCAAGAG GAAATC AC C A 17350 

CACTGCAGAT ACAGTCCATG T A A A AC AAA T TGCTATGGAT TTTGAAAGTG 174 00 

AACCTTAAGA GAACTGCACT ATGTTTTCTT CATTAGAGTT CTCTGGTAAT 174 50 

TTCCAGCTTT TTTTTTTTTT TTTTTTAGAC AGTGTCTCGC TTTGTCGCCC 17500 

AGTGTCACCC AGGCTGGAGT GCAGTGACGT GATCTCGGCT CACTGCAACC 17550 

TCCGCCTCGT GGGTTGAAGT GATTCTCCTG CCTCAGCCTC CTGAGTAGCT 17 600 

GTATTTTAGT AGAGACGAGG TTTCACCATT TGGCCAGGCT GGTCTCGAAC 17 650 

TCCTGACCTC AAGTGATTCG CCCATCTCAG CCTCCCAAAG TGCTGGGATT 17700 

ACAGGTGTGA GCCACTGCAC CCGGCCAGTA ATTTCAAGCT TCTGAGGAGC 17 750 

CCTTTGAATT GTTAAATAAC TTGTAGCTAT GTCCAACATA TCCATGTTCA 17 800 

GTGTATGTTC GATATTTCTT AGGAAACCTG CCCTTGGTTG TTTTCTTTGT 17850 

GGTAATTCAT GAGCCGGCAA ATTTGACATG TGTTACAGAA TATACCTTTT 17 900 

CTCTGCTCTC CTACCTCATA ACCAGAACTT AATTATCCTG CTTTAGTCAC 17 950 

ATAAATAGCT AACTAAATAA ATATATGAGA TTTCAGTCTG CTCACTGTGA 18 000 

AAATAGACCT TCTAAATGAT CTCTTCCACT TGCAGATATT TGCAAATATG 18 050 

GATCCATCCC TCCTGATGTG GAGGAGAAGT TACGGTTGGA ATGGCCCTAC 18100 

CAGGAGCAAT TGCTACTCCG AGAACACTAC CAGAAAAAGT TCAAGAACAG 18150 

CACCTACTCA AGTAAGAAAT GAAAGGCACC CTAGAGATGT TCCAGCCCCA 182 00 

AAGATATTTG AATAGGTTGG ACTCGGGCAC CAATCTAGCA AGTCCTACGG 18250 

AAGTTGTATA AAGCTGAAAA TACTGAAGCA TTTCCCAAAT GGGAAATCCT 18300 

AAACTCAAAA CTTGCTTTTT GGTTTTTTTG TTTGTTTGTT TTTTCTTCAT 183 50 

CTGACATTGC TTAGTAGTCA CAGAATGAAA GATAAATCAA TCATTCATGA 184 00 

TCTAACAATG ACCTTCAGTG C T C T AA AAA A CTACGGAGTC AAGGAAAACA 18450 

TGAATATATT CCTCATGTAA AATTAAAATA CAGACATATA AAGGGCAAAA 18500 

CATGAACATC ATTCATACCT TGAGGTCCGT CCCCCTCCCA GAAATAACCC 18550 

CCAGTATGCC TTGGTTTAGA GCATTAAGCA GGAGGGCCCT GAGTCACTCC 18 600 

AGACAGTCTT GACCACCAAG CAGCATTCTC TTTTTGTTTC CTCTGTGGCT 18 650 

TTTGCAAACA CAGGGCTAGC TCAGCTACCC ATTAGTATGT T T T C AG T C AC 1S70C 
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TTAAAGCAAG TGAAACAAGG AACCCCCTTT TTTTTTTTTT TTGAGATGGA 18800 

ATCTCACTCT TGTCGCCCAG CCTGGAGTGC AATGGCGCAA TCTTGGCTCA 18850 

CTGCAACCTC CACCTCCCAG GTTCAAGAGA TTCTCCTGCC TTAGCCTCCT 18900 

ATTCATTATG AGGAATATTT GATTATTCAG TTCCTGTAGG GTAAAGATAT 18950 

TACCCCCGAT CATATTATTG ATTATTGAGT AGCTGAGATT ACAGGTGCCT 19000 

GCCACCACGA CCGGCTAATT TTTTGTATTT TTTAGTAGAG ACAGGGTTTC 19050 

ACCATGTTGG CCAGGCTCCA GGCTCGTCTC GAACTCCTGA CCTCAGGTGA 19100 

TCCACCCACC TCAGCCTCCC AAAGTTCTGG GATTACAGGC GTGAGCCACC 19150 

ACTCCTGGCC ACAATCCTTT TTTAACTATG AAATATATTT TTATCTGAAG 192 00 

TTTGATGTTT ATACCCAACT GAGGGATGAT GTTCCCATAT CTCAGTTAAA 19250 

GAAATAACCT GCTCAGATAC TTCAAGCTCT TCTTTTGACT TTTGAAAATA 19300 

AATGATCTTG AAGTTACTAT ACTTTGTTTG GGTTAGTTAA CATTATTTAA 19350 

AGTATATTAT TTTAATTAAT TATCTTTGTA AGATTTTACT GTATACTACC 194 00 

TGGAGTTCAA TGTATCAGAT GGATTTCAAA TTTATGTACA TTTTTTATGT 194 50 

ATATGGTACA GAAAAAAATG TGATCCATAA GAAATCAGAA AATAGCGCAT 19500 

ATGCTAATAG CTAATGTTGT CCTCTAAAAA ACTTATTTTT GCATTTTTAA 19550 

GAGGGGGATA TACTCTGACA CTTTAATAAG TGTAATTAAT TATTGACTGG 19600 

AATTTGGCAT GAGGCAGGGC CATTTCAGAT CCCATTAAAG GAATGACACA 19650 

TACCAGAGAA CCACAGAAGT AAGGCCACAT TTGTAATAAA TCATTATAGC 19700 

TCTGCTAGGA GAAGACCCAG TTGTATTAGG TAATTAATGG ATTTGCTCTT 19750 

AAAACACATG TCCCGGAAGA TATAGGTGAG TCTTGGGGGG CCGCATTAAA 198 00 

CATTATACCA ATGTATCTTA CATTTCTAAG AAAGTTTTAC TACTTTACAG 198 50 

GATCTTTCTG TTACCAAAAT GGAAGGTTTC CAACTCCAGG ACTTGGCTTT 19900 

CATAGTTCCT ACACCAGGGG AAATGCCTTC CTTTGCTAAC TATGCAACCA 19950 

GGTTAGTTAG TGTAAGTCCA GCCACCCTGT TGGCAATGCT AAAAGGTACA 20000 

ACAAACACAG AATTTTATTT GCATTTGTAA ACATTTGATT TCTGGCTCGA 2 0050 

AATTTTCAGT TTTCATGGGC ACGTCATGGA AACAGAAATC TTCTGTGTTT 2 0100 

AGTTTGGGCA CCTACTCATT GTAGTGACAA ATATTTCAGA AGCCAATAGG 20150 

GGATTCCACA AATTGTTCTG AACCTGTGGC TGAGACTGGT AATGGCTGAG 2 02 00 

TGACATGGGG ACATACCACA AAAGAAGAGG TAGCAAAAGG CTGCTGAGAT 2 0250 

AAGGACATGT TCATTGCTTA GCTAGTGGCC TGCACCCTTA AAACACATGT 20300 

CCCAGGCTGG GTGCTGTGGC TCACGCCTGT AATCCCAGCA CTTTGGGAGG 20350 

CTGAGGCGGG TGGATTACCT GAGGTCAGGA GTTCGAGACC AACCTGGCCA 204 00 

ACATAGTGAA ACCTCATTTC TACTAAAAAT ACAAAAATTA GCCAGGCATG 204 50 

GTGGCGGGCG CCTGTAGTCC CAGCTACTCA GGAGGCAGGC AGGAGAATTA 2 0500 

CTTGAATCTG GGAGGCAGAG GTTGTGGTGA GCCGAGATTG CGCCACCGCA 205 50 

CGCTAGCCTG GGCGACAAAG TGAGACTCTG TCTCAAAAAA ACAAAAACAA 2 0 600 

AAAACAAACA AACAAAAAAC AACAACAACA AAAAAACGGG TATCCCAGAA 20 650 

GATACAGGTA AGTTTTCTAA CACAGGTCCT CTTGTATGGT GCGTTCCACT 2 07 00 

TAAGTAGAAG ATGACAAAAA CATTTGTCAT GAGAATATAG ACTCACATTT 20750 

TAAACCTGTT TGAGCAGGAA AAGGAAGCAA TGTTACAGAT GTAATTCTGG 2 0800 

GTGTGACTGC AGAAAGGATG ACTCCCTTAT TAAAGTAGTC ATCCTGAGTG 2 08 50 

AGCTAACTCT TTGTACTTCC TCTTCTCCTC CTGTTCCCCT CATCACCCCA 20900 

TTCTTCCGTT GCCTACACCC AGGCCCACAT TGGATGCTGA CATAGACTTA 2 0 950 

CATGGTACAG TCCAAGGGAA AGATCTGCCA TTTTTTTCAA TGTGTCATCT 21000 

TGGTTATCTT CATTCCAAGG ATCTCTCCAC TCTTTATACA GTAAGAGATG 21050 

AGAGTCTGGA AAGGATTGGG AATAAGATAA TGAATTGTAA GTTTTAAATT 21100 

GTTCTTCGTA TTTTGGGGAA GGAGTAGGCT AGGTGGTCCT TCTGTTTTTT 21150 

TTTTGTTTTT TTTTTTAAAG TAGATGTGGC CAGACGTGGT GGCTCACGCC 21200 

TGTAATCCCA GCACTTTGAG AGGCTGAGGC AGGTGGATCA CTTGATGTCA 21250 

GGAGTTCAAG ACCAGCCTGG CCAACACAGT GAAACCCCGT CTTTACTAAA 21300 

AATACAAAAA CTAGCCGGGC TTGGTGGCGT CCACCTGTAG TCCCAGCTAC 21350 

TGCAGAGGTG GAGGCAGGAG AATCACTTGA ACCCGGGAGG TGGAGGTTGC 21400 

AGTGAGCCAA GATCATGCCA TTGTACTCCA GCCTGGGCGA CAGAACAATA 21450 

CTCTGTCTCA A A A. A A A. A. AG A GAAAAGAAAA G A A A. A AAA. G A ATGGATTTGA 21500 

ACTCAGTCGT CAATAGCCTC TATTCCAGGA GATGTTACAG TTGATTATGT 21550 

TATAGGGGGT GTATAATAGA ATTTCGAGCT ATGTAAATTC CAAGTGCATT 21600 

TGGAAGAATG AAGAAATGGA GGAAGGGTAA AGTATGAGTG CAAGCATTCC 21650 

AGGTTTTTTG AAAATGCTAT AATCTTTGTT CAGGGCTAGT ACAAAGTGCT 21700 

ATTTAGCTGT AAGGGTTTTT TGTGATTTAC AGACAGTTTT CACATGTGTC 21750 

ATTTCAACCT TGGTTTTATG GCGAAGGCAT GTGATGGTGC TTGTCCCAGG 21800 

ACTTTAGATC CATATCTGAG GTTCCTGTCG GGCAAAGATA TTACCCCTGA 21850 

TCATATTATA GTCTATAAGT GGGAGAGTTG TGCCTGGAGC TCAAGTCTTA 21900 

TGATTTCTGA TCCAGGGCAC TTCCTACAAC ATGATTTTGC AATATAAAAG 21950 

CCTATAATGT GTGACTAAAG CAGGTCACTC ACCCCTTGTA ACAGACTCTA 22000 

GTAATGGTAC TGCCACCAAA CGGCTGCGTG ATATTGGGCA AAGACTTACC 22050 

TTATTTGAAT CTCAGTTTCC TC C TAG AAA A ATGAGGGTGG AGGTTAAGCA 22100 

TAGGCTGATG ATCCTAAAGC CTCCATACTG CCCTAAACTG TGGCTCTAAG 22150 

ATCCAGTAGA ATGCTGGGTC ACAGGACTCT AGGGAGCTTT T C AAA C C C A A 22200 

ATGTCTGTCA TTCCTTGATG GTAGGCAGCA GTTTATGGAA GTGGGCGACA 22250 

CAGCAAATAT CAAAATACCT AAAGCAGCTT GCAAGAGTTG TTTCTGCCTA 2 2 300 

GTGGTCTTTA TAGTTAATAT TAAATAGTTA ATTTTTTTTT TTTTTGAGAC 22 350 

AGAGTCTTGC TCTGTTACCC AGGCTGCAGT GCAGTGGCAC AATCTCGGCT 22 4 00 

CACTGCAACC TCCACCTCCC GGGTTTGAGC AATTCTGTCT CAGCCTCCCA 22 4 50 

AGTAGCTGGG ACTACAGGTG CATGCCACTG CACCCAGCTA ATTTTTGTAT 2 2 500 

TTTTAGTAGA GACGGGGTTT CACCATATTG GGCAGGCTGG TCTCGAACTC 22 550 

TTGACCTCAG GTGATCCACC TGCCTCAGCC TCCCAAAGTG CTGGGATTAC 22 600 

A.GGCATGAGC CACTGCACCC AG C T T AAA T A GCTA_ATATTT A_ATATTATTC 22 650 

TATAGTTA7T CAAGTAATTC AGGCCAAAGA CTTAGAAACA AAA C A A AAA G 22700 

C C AC 7 T 7 GGAG.w-.GGG 7G7AAG777C- C 7 AG A 7 A G A 7 A G AG A 7 C 7 7 7 22 7 50 
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CTTTTTTAAC TACAAGAGTT CAGGAATGAA TTACTCTTTA ACAAACGACT 22800 

ATAGATATAC ATGAAAATTG GAAGGACTTA TTATGCATAT GATAATCAAT 22 850 

TTAAAGACAA CACTTAAAAT TATATTGTTG CCACTCTCAA AAAGTGGTAA 22 900 

TAGAACAGCT AATGGTTTAA AAAGCAGAGT ACAGAAGTTC CCAAACTTAT 22 950 

GGCACCTTAA TATCGCAGAA AACTTTTTAA AGCATGCCTA GGCCACAAAA 23000 

AATACCTGTA TTTTGATTAT TAAATTGTAA GGTCTACACA ACCTAATAGT 23050 

AATAGGTCCA ATAGTAATGC TGTCCAATAG ATGTTGATGT TTTTTTCCTT 23100 

GCAAACTTAA AAGATCCTAC AGTGCCTCTG TAAATAGCAC TGCCTGGTTA 23150 

GAGTTGAATT TCAGATAAAT AATTTTTTTC ATGTTAATTA TTTTTCTTTT 23200 

CTTTACTTTT TTTTTTGTTT TTTTGTTTTT TTGTTTTTTT TTTTGAGACA 23250 

GGGTCTCATT CTGTTGCCCA GGCTGCTGTG CAATGGCATG ATCATGGCTC 23 300 

ACTGCAGCCT TGACCTCCCT GGGCTCAGGT GATCCTCCCA CCTCAGCCTC 23350 

CCAAGTAGCT AGCTGGGACT ACAGGTGCTT ACCATCATGC CCGGCTAATT 23 4 00 

TTTGTGTTTT TTGTAGAGAT GTGGTTTTGC CATGTTGCCC AGGCTGGTCT 234 50 

TGAACTCCTG GGCTCAAGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTA 23 500 

GGATGACAGG CATGAGCCAC TGCACCTGGC CCCTGGGCGA AGTATTTCTT 23550 

AATGGTTACA TAGGACATAC ACTAAACATT ATTTATTGTC TATATGAAGT 23 600 

TCAAGTTTAA CTAGGTGCCC TGCACTTTTA GTTGCTAAAT CCTGTAGCTG 23650 

TACCCATGCA TTCACTGGTG CTCCCCAGCT TGCCTTGCAC AGAGTTTGGA 23700 

AACCATAGTC CTATAACTCT AGGCCAATTT TTTAATGTAA AATTTGATTC 237 50 

ATTTTAAATT A AT A AAT AAT AACAGGAATT TTTTTAAAAA TTGTTTTAAA 23 800 

TATAATTAAA ATTATCAAAA TATTTTTTAA CTGAACTTGT GACTAGAGAT 23850 

ATTTAGATTA TGAAGAGTGG GGTTTATGCT AACTAATGAC AGTCTGGCTA 23900 . 

TGCATGTGGA GCACTGAGCT ATAAATTGTG GCTTCCCCAA TTCTCCTGAT 23 950 

GTCACTTGAA CAAAACCTAA GTGTCAGACC AGAGCTTCTG GTATCTTCCA 2 4 000 

TGGGATTTCA TTCAACAGCT GGAGCAAATG AAGTCAGATT GATTTTTTTT 24 050 

AATTTGTCCA ATTTTGTTGT CTCAAAAACA TAATTATAAT CATTTATTAG 24100 

AACTAGAATT TCTTCAGTTT AACAACAGAA ATAGTTATTC ATTATGAAAA 24 150 

GCGAATCTGG AGGCCTTCAT TGTGGTGCCA ATCTAACCAT TAAATTGTGA 24 200 

CGTTTTTCTT TTAGGAAGCT CTGTAGATGT GCTATACACT TTTGCAAACT 24 250 

GCTCAGGACT GGACTTGATC TTTGGCCTAA ATGCGTTATT AAGAACAGCA 24 300 

GATTTGCAGT GGAACAGTTC TAATGCTCAG TTGCTCCTGG ACTACTGCTC 2 4 350 

TTCCAAGGGG TATAACATTT CTTGGGAACT AGGCAATGGT GAGTACCCCA 24 400 

GGGAACAATT CATTAATAAG GAGATTCCCC ACTAGCATTA TTTCTTTTCT 24 450 

TTTCTTTTTC TTTTCTTTTT TTTTTTTTTT GAGACAGAGT CTCGCACTGC 24 500 

TGCCCAGGCT GGAGTGCAGT GGCGCCACCT CGGCTCACTT GAAGCTCTGC 24 550 

CTCCCAAAAC GCCATTCTCC TGCCTCAGCC TCCCGAGTAG CTGGGACTAC 24 600 

AGGCACCCGC CACCGCGCCC GGCTAATTTT TTTTTTTTTT TTTTTTTTTT 24 650 

TTTTTTTGCA TTTTTAGTAG AGACGGGGTT TCACCGTGTT AGCCAGGATG 24 700 

GTCTTGATCT CCTGACCTCG TGATCTGCCC TCCTCGGCCT CCCAAAGTGC 24 7 50 

TGGGATTACA GGCGTGAGCC ACCAGGCCCG GCTAGCATTA TTTCTTATGA 24 800 

CACTTTTTTT TTTTTTTTGA GACGGAGTCT CGCTCTGTCG CCCAGGCTGG 2 4 850 

AGTGCAGTGG CGCCATCTCG GCTCACTGCA AGCTCCACCT CCCAGGTTCA 24 900 

CGCCATTCTC CTGCCTCAGC CTCCCGAGTA GCTGGGACTA CACGCACCCG 24 950 

CCACCACGCC CGGCTAATTT TTTTGTATTT TTAGTAGAGA CGGGGTTTCA 25 000 

CCGTGTTAGC CAGGATGGTC TCTATATCCT GACCCCATGA TCTGCCCGCC 25 050 

TCGGCCTCCC AAAGTGGTGG GATTACAGGC GTGAGCCACT GCGCCCGGCC 2 5100 

AACACTCTTT TTATTATTAG CAAATATACT TCTGCCTGGG CACATTCTTG 25150 

CAAGTGCTCA ACAATGCAAC TTTTGGAAGT GCATGTGGCA GAAACTCCTG 25200 

CTGTATTTAT TCCAGAACCT ATTATTGCTA ATCCCAGTTT ATGTTACATT 25250 

TGAAGTGAGA ACCAGTTGGA GCCAGCAACG TTCCCAGCTC CAAAGTTCCC 25 300 

TTGAGATTTT CAGAATCACT TAACCCTATT ATGCTTGGCA ACCTGGACTC 253 50 

AGCAAAACTG GGAAGTCAGC AGTTTGTTTT ATTCATCCCT TCCTTTCTCA 25 4 00 

GTTTCTCAAA TGTGTCAGTT AATCTCAGTA ACCCCATTGC AACCTTCATT 25 4 50 

ACCTGCCCAA GCGGTCTAGA ACTTGCCAGT ATAGAATCCT ACGTGGGTCA 25 500 

AGCTCCTGAC TGTCTCCTTC TTCACTCTTT TTTTGCAAAG AACTTGTAAA 25 550 

TTTTAACTAT AAGTATTCAT GATTCGCCAC ATTTATTCAA AACATAGAGT 25 600 

GCTTTTTCCA CATATCAGCC AATGGAAATA AGGATTAAAT GGGAAATGAA 25 650 

ATGTAGTAAT AGGATAAGCA CAAGTCTTCT TCCTGCTCAA ACTTTTTTTT 25700 

TTTTTTTTTT CAGACAAGAT CTTGCTCTGT TACCCAGGCT GGAGTGCAGT 2 57 50 

GGCGTGTTCA TAGCTCAATG TAACCTCCAA CTCCTGGGCT CATGCAATCT 25 800 

CTCACACCTC AGCCCCCTGA TTAGCTAGGA CTACACTATG CCTAGCCAAT 25850 

TTTTTTTCTT TTGTCTGGTT GTGTTGCCCA GGCTGTCTCG ATCTCCTGGC 25 900 

CTCAAGTAAT CCTCCTGCCT CGGCCTTCTA AAGTGCTGGG ATTATAGGCA 2 5 950 

TGAGCCACTG TGCCCGGTCT CAAACCTTTT TTTCCAAAGT AAATGAAGTT 2 6000 

ATTAGATATG GAATATAGTC TAGTTCCCAG ATATCCATAT CCATTGGTTT 2 6050 

ATTACCCTCA TTATTAACTT CAAATTGTTT AATAGACCCT CATATCTCAG 2 6100 

TTATACAGTT AAAATTTTTG TTTTGTTTTT CTGGAGTATC TTATTTATAA 26150 

CTATGAGTTT TACTTTACTT ATTTATTTTA TTTTTTGAGA CAGACGCTTG 2 6200 

CTCTGTCACT CAGGCTGGAG TGCGGTTGCG TGATCATGGC TCACTATGGC 2 62 50 

CTCGACCTTC TGGGCTCAAG TGATCCTCTC CCTCAGCCTC CCAAGCTGAG 2 6300 

ACTACAGGCA TGCACCACCA CATCTAGCTA ATTTTTTTTT TTCCCCATGG 2 6350 

AACAAGGCTT TACTATGTTA CCCAGAGTGG TCTCAAACTC CTGGCCTCAG 2 64 00 

GGGATCCTCC TGTCTCAGCC TACCAAAATG CTGGGATTAC AGGCATGAGC 2 64 50 

CATAGCGCCA GACCTGGTTT TACTTTTCTT GACTTTGAAT TACAAGTTTT 2 6500 

TGTAATTTGG AAAATGTTTT GTTGCTTTTA AATACTGCTG TATGTTTGCT 2 6550 

TTTAAATACA ACATTTCTCG ATATATATTT TGAGAATTGC TGTCTTTCAG 2 6600 

AACCTAACAG 777CC77AAG AAGGC7GA7A 7777C.--7C.-_-. 7GGG7CGCAG 2 665C 

TTAGGAGAAG ATTTTATTCA A T T G C AT AAA C7TCTAAGAA AGTCCACC77 2 6^00 

G AAAA A.7 G C A AAAC7CTA7G GTCC7GA7G7 TGG7CAGCC7 CGAAGAAAGA Z 6" 50 



CGGCTAAGAT GCTGAAGAGG TAGGAACTAG 
TTTTCTTCTT TTTCCTTTTG AGACAGAGTC 
GAGTGCAGTG GTACAATCAT GGCTCACTGC 
AAGCAATCCT CCCATCTCAG TCCCACAAAT 
ATCACCACAC CTGGCTACTT T A A A A A A A T T 
CCCTGTGTTG CCCAGGCTGG TCTCTTGAAT 
TCCACCTCAG CCTCCCAGAG TGCCAGGATT 
CCAGCCACCA CTTTTCTTAA AAAAAAAAAA 
TCCTCAATAG TCCACATGTT ATTAAACAAT 
TTACCAAAAA AAGGAAATTT TGACGGGTTC 
GCAAATGTCA CCTATGATAA AATTTGCTAT 
TTACCTGATC CTAAAGCAGT AACCAGCCCA 
CATGCGTATA TTGTGCATAT ATATGTATTA 
ATTTTTTTTC TAGCTTCCTG AAGGCTGGTG 
ACATGGCATC AGTAAGTATG TCTCCTATTC 
CTAGCTTTAT TTATTACCTA GTATTCAAAA 
AATTGACTGC AGTTCAAATA AGAAACAAAT 
ACTCCAATTT TAATATTAAT A A AAA AAAT T 
TAGTGGTTTC TATAAAGATC ACTTTATACA 
CCATGGAACA TATAAGTAGC TAAAACCAAT 
ACCCAGGAGT ACATGTCCTT GCCACTGTGT 
GATTTCTAGT TACTTGCATA GAATGGACTC 
TCTTGGTCTT TCCCTAGTAG AACTTCTACC 
TGGGAGAGGT AAGAAGGAGA ATAAGGTCAG 
AGTAAAATTT GTTATTTTTT TTCTGAATAT 
TATTTGAATG GACGGACTGC TACCAGGGAA 
ATTGGACATT TTTATTTCAT CTGTGCAAAA 
TTTTAAACTT TTTAATGTAA AACCAGAATC 
GTTCTAAATT CTATAGGTAT GTATATTTAC 
AACAAGCACT ATGACTTATC CACTGTTAGT 
TTACCCCATG TACGTGATTA GAAATTTGAA 
TAGAATTAAC TCACATAGAT GATAAGAATG 
CTTCCACAGC CTACTATTTC AATAAAAGAA 
ACTATGAACA TATTTTATAA CTATATAGGA 
AAGTTTTGAA TGCTGTTAAT CTTCAACACC 
GCTTTTTTGC AATTACCATG GATACTTTTC 
GCACCAGGCC TGGCAAGAAG GTCTGGTTAG 
GGAGGCGGAG CGCCCTTGCT ATCCGACACC 
AGTGAAGCAG CGCTGGCCTT AGGGGTCAGA 
TCTATTCTGC TGAAATAGCT CCCCAGCCAA 
TCAGTGGCTG AGCCCCAAAA TTCATGCCAG 
ACTAAAGCTT GAGGGACATC TTTAACAAGT 
AGGATGAATT GTTTCAGAAA TTTTGGCCTT 
GTCAAGTAGT CCTTACTCTA AAGAAGTACA 
CCGGATATGG TAGTTCCCTG TAATCCCAAT 
GAGGATTGCT TGAGCCCAGG AGTTTGAGGC 
CCACTGCACT CTAGACTGGG CAACAGAGTG 
CTCTGTCACC CAGACTGGAG GGCAGTGGCA 
CTCTGCCTCC CGGATTGAAG CGATTCTCCT 
TGGGACTACA GGAGTATCAC CGCACTGGGC 
GAGACGGGGT TTTGACATGT TGCCCAGGCT 
AAGTGATCTG CCTACCTCAG CCTTCCAAAA 
GCTACCACGC CCGGCCACAC CCTGTCTCTT 
TTAGAGCATA TTACAGCTTT GTCTCTCAGG 
CTATAATTCA TAGATTCCCA AGAAGTTTAG 
ACCAGAGGGG CTATCATTAA ATTTAAAGAT 
CCAACACCAC AAACTTGATT GCTTTAAAAT 
TAACTCTATT AGTGCTTTTA ATCTATACTG 
TTTTTTTCTT TTCTCTTCCA TCTTCATTCT 
TTATAAGCCT AGAATACATC ACAAATCCTT 
GAATAAAGAA TGGAGATGTT TGTTTTGCCA 
TCGGGGAGAA GGGGGATAGA GAAGGAGAAG 
AGCTTAGGTG CAATTCTGCT TATTTTACAT 
CTTTTTCTTC AGCCCTCACA CATTGTTTGT 
GGAATTGTCT ATAGAGGTGG GAATTTGTCT 
TAGCATGGTA ATAGTCTTCT AGGATTTGTT 
GGGAGGGATT CTGCTGCTGC TGCTGCTGCT 
TTAAATGACT TATTTATAAT TGATGACACT 
CCTCCCTCAA AGATCAATAA ACCAGAACCA 
TGGTCCTGTA ACCACCCAAC AGGTTCACCT 
CCAATTATCA AGACAGGGGA ATTGCAAAGG 
AGCCAGCTGT GCAGGAGACC AGAGTTTTAT 
CCGAACATTC GAGGATCAGA GCTTTTAAGG 
TTAGGAAGTG GAGAGTGCTG GTTGGTCAGG 
AGTGGAAGTG AGGTTTTCTT GCTGTCTTCT 
AACTGGTTGG GCCAGATTAC CGGTCTGGGT 
GTTCAGGGTC TGCAAGATAT CTCAAGCACT 
TGATGTTATC CCCAGGAACA ATTTGGGGAG 
AGGCTGCATT ATCCCTAAAC CGTAATCTCT 
AG7CC7GCAA .-.GOT A G A 777 G77CCCAGGC 
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AGGATGCAGA ATCACTTTAC 2 6800 

TCACTCTGTC AGCCAGACTG 2 6850 

AACTTCGACC TCCCAGGCTC 26900 

AGCTGGGACT ACAGGTGCAC 2 6950 

TTTTTGTAGA GATGGGGTCT 27 000 

TCCTGTGCTC AAGCCATCCT 27050 

ACAGGCATGA GCCACCACAC 27100 

AGATTCTCTC TGGTAGACAA 27150 

CTGCTGCCTG AATACATGAT 2 7200 

AGAATATCAA GGGATCTGAG 27250 

CAAAATTAGG AAGTTTGTGT 27 300 

TTTCTAGGGA ATAAAACTCT 27 350 

TATGACTGAG TGATAATAAA 27 4 00 

GAGAAGTGAT TGATTCAGTT 27450 

TTAATACTAG GAAAGTAAGG 27 500 

AGTTAGTTCA TTTAACTGCC 27 550 

AGTGTCTCAA GTAGCACTGT 27 600 

TTAAGTTATT TTAAATAATG 27 650 

GAAGAACAGT GCCAATTAAC 27 7 00 

TGCTTGCCAA AGAACCAGTA 27 7 50 

TTTTTCAAGA CAGAGTAACT 27 8 00 

CTCCTCATAA CTCCCTTCCA 27850 

TTTTTTTAGT AACAGGTGAG 27 900 

CAATTAACCT AAAAGCAGAA 27 950 

TTTCTGTGTA ATTTAGCTAC 28 000 

GATTTTCTAA ACCCTGATGT 2 8050 

AGTTTTCCAG GTAATAGTCT 28100 

CTTATTTTAT AGTCTAGCTA 28150 

ATGTTTTTCT AATTTTAGAG 282 00 

TTTCCCCTTA GCATTGGGTC 28 2 50 

ATATTTCCAA TAGCCTTTAG 28300 

GGTTGGTTCA CTTCATGTTC 28350 

AGTTTCCCAA GACCTAAATG 28 4 00 

GGGGTGGGTC TAGGAATACA 28 4 50 

ACAGTTGAAA CCACAGGTCA 2 8 500 

TGTTCTATAG GTGGTTGAGA 28550 

GAGAAACAAG CTCTGCATAT 28600 

TTTGCAGCTG GCTTTATGTG 28 650 

GTGCAGCTCT TCTCCATCCT 28700 

AAAGCAGATC AAAGACCGTT 2 87 50 

ATTTTGCAAG AAAATGATTT 28800 

GTTCCAAATT AATCACTATA 28850 

TAATTATGGC CC AT AAAT AT 28 900 

CTGTAAAAGA ATGCATATAG 2 8 950 

ACTTTGGGAG GCCAAGGTGG 2 9000 

TGCAGTGAGT TATGATGGTG 2 9050 

AGACTGTCTT TTTTTTTCCC 2 9100 

CGATCTCACC TCACTGCAAC 29150 

GCCTCAGCGT CCTGAGTAGC 29200 

TAATTTTTGT ATTTTTAGTA 2 9250 

GGTCTGAAAC CCATGAGCTC 2 9300 

TGCTGGGATT ACGGACATGA 2 9350 

AAAAAAAAAA AAAATGCAAG 2 94 00 

AGGATACTTA GTGTATGTAG 2 94 50 

AGCCTAAAGT ATGAGGTCCC 2 9500 

TTGTTAAATC ATCTCATTGT 29550 

ACTGGTTTAG TTACATTTAG 2 9600 

CTATATCCTC ACATTGAGAT 2 9650 

TTTTTCTCTC ATCCTCATTC 2 97 00 

TATGCCCATG GAAGCAAGAG 29750 

TTAACTAAAG ATCTGGGGTG 2 98 00 

TGGGAAGAGG TGTCCATAAT 2 98 50 

TTTACCCCCG CTGACTGCCA 2 9900 

GCAGGGACCT CATAGGACCA 29950 

CACCCTGAAA GGGATACCTC 30000 

ATCATATGGA AAGATGTAAA 300 50 

GCATGCAGTT GCCATTTCAT 30100 

TTTCTGGCTT CCTGTTAATT 30150 

GGCATGGTGG CATGCACTTG 30200 

TGCCTGCTGT C TAG AT AG AG 302 50 

AGAAAGAGTA ATTTATGCAG 3 0 300 

TATTACTCAA ATCAGTCTCC 3 0 350 

ATAATTTGGC CGGTAGGGGC 3 04 00 

TTGGAGATGG AATCACAGGG 304 50 

GTTCCTGGAT GGGATGGCAG 30500 

GGTCTCAAAT GATCCACCCA 3055 0 

GATCTTAGGT TTTACAACAG 30 600 

GTTCAGACTC TTGGAGCCAG 30650 

AATGTTGTAG CTAATTTGTT 30700 

AAGAAGGGGG TCTTT7C AGA 3 0 7 5 0 



AAAGGGCTAT 

TTCCCAAAGT 

AGGTTAGAAG 

AATTTCCTCA 

GGGAGGCTGA 

AGAGCTATGA 

CTGTCTCTAA 

AAGATGGTGT 

AGCTTGGTCT 

CCGAATGGGA 

ACTACCATTT 

TATTTTCCTA 

TGTTAACCTC 

AGATGATGAA 

TGTGTGTTTG 

AAGCCATTTG 

ATTTTATGAC 

TACAAAGTAA 

CATTTTTTAT 

TACCATGTGT 

ATAATCTTCA 

AACTGTATCT 

GGAAGATCTG 

TGCGAAGCAT 

CTATTTTCTA 

GCTAAGAAAT 

TGGTTTGTGG 

CAAGCTTGTG 

GTGATCTTGC 

TATAGAGTTA 

TACTCCTGAC 

ACCTTTGCCT 

AAGGTTCTCA 

CACTTGTAGA 

ACTTTATTTT 

TGGATCCATT 

TTTACAGGAA 

AAATCCTGCT 

CATGCTTATG 

AAAAGTGGAA 

ACTTTCCCTC 

ATTTATTTAT 

TTGTGCAGGC 

GCCTCCCAGT 

CGCAGTTTTT 

AAATACAGAG 

CCCCAAAATA 

TGGGAAGAAA 

AAAGCAGGAC 

AAAACAAGAG 

GATGGCACTA 

AACTTGCCAC 

TCCAAAAATT 

TACTTGTTCC 

TAACAAGCTT 

CCATCCCAAC 

CATACTTTGA 

CCTATTAGAT 

ATATTTCAGT 

TTTGGGAAGC 

GCTACGGCAA 

TGTGGTCCCA 

GGAGGTTGAG 

GGTGACAGAG 

CCTTTTTGTA 

ATTCCTTAGT 

GTCTAAAATA 

TAT ATT AC AT 

TTCCTCCCTC 

AATTTGTTCT 

CTATGATATC 

GGTAGTTATA 

TTTTGCAAGA 

TGTGTCTGGT 

TACTTTATTG 

GATTTTTTTA 

TGGAGTGCAG 

TCAGGTGATC 

AC AC C AC C AC 



TATCATTTTT 

TAGTTCAGCC 

CAAGATGGAG 

GTTATAATTT 

GACAGGAGGA 

TCACGCCACT 

ATAAATAAAT 

GCAATTAGAA 

TGCTCTGTCC 

ATAGAAGTGG 

AGTGGATGAA 

ATTCTAGTGG 

CTATGAACAG 

TTAGAAGGAG 

AAGAGAAGAA 

CAGTATAGTG 

TCATTATACA 

ATCAAAGTTA 

GTTATGAAAT 

TTGCTTAAAA 

AGATATTTAT 

GGTGCTAAAT 

TATGTCTAAA 

GACCTTGATT 

AGAATAATTC 

TTTGCAAAGA 

ATAGACTTCC 

GAACATATTA 

AAAAATGGAA 

AACTAAGCAT 

CTTTTTATCT 

TACGAACTCT 

GTCAGCTAGA 

TCCAAGAGAA 

ATTTGCATGA 

TTTTAGATAA 

GCCATACTGT 

TCTGAGGCCT 

TCTCCTTTGA 

TTTAAGCAGA 

TACTTTCAAG 

TTATTTATTA 

TGGTCTCAAA 

GTTGGGATTA 

TAAGAAAAAC 

GAAAGTATAT 

TGCCACTTTG 

C AC AT AG A AG 

ATGAATCTTA 

TTAATCACTG 

GAAGAATCTA 

CCCAGAGACT 

TGCTCTATAA 

CTGGTATTTT 

CTGTTTGTTT 

TAAGAACTAA 

TCTTGTTTAA 

TACTTTGAAG 

ATGTGCTAGG 

TGAGGCAGGA 

C A A A A A AT C A 

GCTACATGAG 

GCTGCAGTAA 

TAAGACCATG 

AAAACACAAT 

ATCACCAAAT 

TTGTTGATAG 

TTGGTTGACA 

TCTTTCATCT 

ATAGTATTTC 

ATTTAGCATG 

CCTAGAAGCT 

ATTCTTTATT 

TTTGATCTTG 

GGGTGGGGGG 

ACTGTTATTT 

TGGCACAATC 

TTCTCACCTC 



GTTTCAGAGT 

TACACCCAGG 

TCAATGAGGT 

TTGCAAAGGC 

TTAATGGAGC 

GCACTCCAGC 

AAGTAAATAA 

TTGAGCGATT 

CAGGTGGCTG 

TGATGAGGCA 

AACTTCGATC 

AGTAGATTAA 

TCAGTCCTCT 

CCTTAGATAG 

ATCAAGAGCT 

TGGATTTTGT 

AGACAAAATA 

TAATTGCCTA 

TGTAATTTAT 

ATCTCATGCT 

GAATAAAGTC 

CAGGAAATGT 

TATATGTCAG 

TTTATAGTCT 

CTAAAAGAAT 

GCGTACGTGA 

CAACAAAATT 

GTCATCTTTT 

TTTATCTTTC 

AGTAATTTCA 

CATCCAAATT 

TTGTATATGC 

AAAATGTGCA 

TTAGACTTAA 

CAGTCCTGTG 

GGAAGTTCAA 

AGTCCTATGT 

GCATACTTTC 

AAACATTGAT 

GAAACAAAAG 

AAGGAAAGTT 

TTTTAAAAAT 

CTCCTGGGCT 

CAGCATGAAC 

TTTTACTATA 

GAACCCACTT 

GCATAAGGAT 

AAAAGTTCTC 

AAAGTCCCCC 

AAGATAACTT 

TATTACATAC 

AAAAATCCTT 

GCTGGAGTTC 

CTGTTAACAT 

TTCTCCTGTT 

AGAGTAGGAG 

TCCGTAACCC 

CAAATTTCAG 

TGTGGTGGCT 

GGATCACTTG 

AAAACTTATC 

AGGCTGAGGC 

GCTGCATTCA 

TCTCAAAAAA 

ACTTTTATCA 

ATTTTGTCAG 

TTATTCAAAT 

AGTCTCTTAA 

CTTGTAATTT 

CTACATTATA 

TTCCTCTGTC 

TGAGTTTATT 

ATCTGCTTCT 

ACAGCTACTG 

AATAAGGTTT 

TGAGACAGTG 

ACGGCTCACT 

AGCCTCCTGG 
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CAAACCATGA ACTGAATTTC 30800 

AATGAAGAAG GACAGCTTAA 3 0850 

CTGATCTCTT TCACTGTCAT 30900 

GGTTTCAGTC CCAGCTACTT 30950 

CCAGGAGTTT GAGGTTGCAG 31000 

CTGGGTGACA GAGTGAGACC 31050 

ATAAATACAT AAATAAAATC 3110 0 

TTGTTTCCAA ACCTCAAGAA 31150 

GATAAATTGG GCCTGTCAGC 31200 

AGTATTCTTT GGAGCAGGAA 31250 

CTTTACCTGT AAGTGACCAT 31300 

AGTCAACTCA GGACCTCTGG 31350 

CAGTAACTAG CCAAATCATG 314 00 

CATCCAATCT AACATTTTTT 31450 

AGGAATAACT TTTTAAAGGT 31500 

TTAAAAGGGG ATAATTTGAA 31550 

AGTTGGATTT TCAAATGTTT 31600 

CAGTACGCAA AGCTTCAAAA 31650 

TTAACCTTAA AATGAGCCAG 31700 

AAGAATTTAC TATGTTGTTA 317 50 

TTATTTCTAA TCCTTCCTCC 31800 

TTCTTCCCAA AAAGCCTCGT 31850 

GGATAATACA GATGTAGCCC 31900 

AAAATGTCAT TTGCAGATAT 31950 

TATTTGAATG TTGTAGGAAA 32 000 

AAATATAAGC TAGGCTTTTG 32 050 

GCTTTTTATC TATAGTGATC 32100 

TTTAGAAAAT TCTTAGAAAA 32150 

CCCAAGTATA TTCTGTCATG 32200 

CCAGACAAAC ATTCAAAATC 32250 

TTCCCAGGGC CCAGACATAA 32 300 

ACTAAATATG CTTCTCCTTC 32 3 50 

AGAGTAAATG GTACCCTTCT 32 4 00 

ACTCACTCTA CATGTCTGTG 32 4 50 

AGGTGGCAAG GCAGGTATCT 32500 

ATTGAGAAGA GGTTGCATGA .32 550 

TACTCTTAAA AATCCCATTC 32 600 

TACCCTACCA GTCATTGACC 32 650 

TCCACTCTTG TCTCCAGTGA 327 00 

CCATTTGTCT TGTTAAGTCT 327 50 

GGGGTATGTG TTGAATGGTG 32 8 00 

TGATACAAGG TCTTACTGTA 32 8 50 

CAAGTGATCA TCCCACCTCA 32 900 

CATTGTGCCC ACCACCGATC 32 9 50 

GAAAATTTTA ATCATATACA 3 3000 

TAGGAGACTA GAATATGCCA 33050 

TATTTCGAGC TAAAGGCAAC 33100 

TGTCCTTCTC CATTTGCCTA 33150 

TCCTTCCCTT TCTACCAGGA 3 3 200 

CAGACCCTTA TCAGTGTAGA 332 50 

TCATTTATTT TCCTTCCCAC 3 3 300 

TTCCTTTGTC ATGTCTCTTG 33350 

TAAGCCACCT CTTTGAGAAT 3 3400 

ACATGTATTA ATATACATGT 3 34 50 

TTCTGTCTTG TTACAGAGGT 33500 

GAAAATATAA TTTCCTCCTG 3 3 550 

TTCCCACTTT TCACCTCCTA 3 3 600 

ATATATTACT TTATCTATAA 33 650 

CACACCTGTA ATCCCAACAC 33700 

AGCCCAGGAG TTCAAGACCA 33 7 50 

TGGGCATGGT GGCACATGCC 3 3 800 

AGGAGGATCG CTTTAGCCCA 33850 

CACCACTGCA CTCCAGCCTG 33 900 

ATACATATTT TAGTATGTAT 33 9 50 

TACTTTAAAT AATAACAATA 3 4 000 

TGTCTCACAT TTTCCTTATT 34 050 

CAGAATCCAA ACAAGGTCCA 34 100 

GTTTGTTCAT CTTTAAGTTC 34 150 

ATTAATGTGA AAAAACAGGT 3 4 200 

GAGTTTGCTA CATTTATTCC 34 250 

CCCTGTGTTT CCTGTAAACT 34 300 

CAGGTTTTTA ATTGTATTTT 34 350 

GGAAGCACAG AATGTCTGGT 3 4 400 

ATGACCATTG CCTAATCCAT 34 4 50 

T A A A AT A A AT TTTTTTTAAA 34 500 

TCTCATTTCG TTTCCCAGGC 34 550 

GCAGCCTTGA CCTCCTGGGA 34 600 

GTACCTGGAA CTACAGGTGC 3 4 650 

7TTGTGTACA GAAGGGGTTT 34 7 00 



CACTTCAGCT 
CCTAAATGAA 
TTTGTCTTTG 
TGGCTTTGAG 
TTAAACTCTG 
GAATTGTGTC 
ATTCATGGTG 
TTTAAATGCC 
CTTGCCACTG 
CTCTTTCTCC 
CAAGAAATTG 
AGAGAAGGAA 
GTATGAAACA 
AACTTTACTC 
GGCAGTTGCA 
TTTTAGAAGA 
CTCGCTCGTC 
AACCTCCGCC 
ACAACAATAT 
GACATCGAGA 
GTTCCTAAAT 
TGATAAGATG 
ATTTCCGATT 
GAAGTAGCGA 
TAGATTTAAT 
CCCCTGTTTT 
GCCTTTTACC 
CAACCCAGCT 
TTGGTAGCAT 
CACTAGCGGT 
GTCATTATGG 
' TACTAATACT 
AGATACATAT 
AAGAAGGAGA 
TACTTGCGGT 
TCTAAGACCT 
TTGTTCATTC 
AGTTTGGACA 
TTCTTAATAT 
ACTCATCCTA 
ATCTTAGAGA 
ACTCAACGCA 
ACATTCACTA 
TGGGTTGGTA 
AGCATTAATT 
TTCTTAAGTA 
ACATCTTATA 
TCTTAAAGAG 
TTGGAAGGAG 
AGTTTACAGG 
GAAGCTGAAG 
GGCAATATGG 
TGGTGGTGCA 
ATTGCTTGAG 
CTGTCACCCA 
TAAATAAAAA 
GCAAATGCCA 
AGGCAGTACA 
ACATTATTTA 
ATTTTACTAT 
CGGGGGAGAG 
GTAGACATTT 
GAGTCTCACT 
ACTGC AACCT 
CTAGTAGCTG 
ATTTTTAGTA 
TCCTGACCTC 
ACACGAGTGA 
TCATGTTTTA 
CCCTAAATTC 
AAATTTTGCA 
CGCTTACTCA 
AAATTATTAC 
CTCCCAGGCT 
TACCTGGGCT 
CACAGGCTTA 
CGATGTCTCA 
GATCTTCCTC 
ACCCAGCCCT 
ATT? AAA T G A 



TCCCAAAATC 
ATTATTTGTC 
TGTGTACATG 
CTTTGCTTTG 
ATCATTCTTG 
CTCCAGTGAT 
GTCACATGTG 
CCCAGGATAA 
GTTTCATTAA 
CCAACCACCA 
GTGGGCACCA 
GCTTCGAGTA 
CACCCTTTAC 
AAACACCCTG 
ATTTAGTAAA 
AAAATGCTAC 
ACCCAGGCTG 
TCCCGGGTTC 
TATTTTCAAA 
TTTTTGTAGC 
CTCAGGAATA 
ATGCTGAAAT 
GTTGGGGACT 
GGGGAATGGT 
TTTCTTATAC 
TATGGTTTAT 
TTCCTATGTC 
GGCAGAGCTG 
GAACGGCAAC 
CTAAAACGAT 
ATCCTAATAC 
TAGGATCACA 
TTCTATTAAG 
TTTAACTCTG 
TACCCTATCC 
TTGGGACCTC 
CAAACTTTCA 
GGGAGCAAAA 
TCAGGAAATT 
AGAGTCTAAA 
ATAAGTTTGC 
TTTAAATTAT 
AAGCAAAATA 
TAAAATATCA 
TTTATTGGTT 
GATTCTCATA 
AAAGCCTGTA 
TTAGGCATAT 
TTCCTTTCTC 
CTTGGCGCAG 
CAGGCAGATC 
CAAAACTCTC 
TGCCTGTAGT 
CCCAGGGGGG 
GCCTGGGTGA 
TTAAGAGTTT 
CATAAGTGAT 
GTAAGCACGC 
CACTGGGTAC 
AACTATAATC 
ATCCAGAAGT 
TTTCTTTCTT 
CTGTTGTCCA 
CCGCCTCCTG 
GGATTAGAGG 
GAGATGAGGT 
AAGTGATCCA 
GCCACCGTGC 
TAATTGGAAA 
TCTTTGATGA 
AAATAGTATC 
TATTAATGAC 
TATCATTATC 
GGAGAGTAGT 
CAAGTGATCC 
TGCTACCACA 
TTATGTTGCC 

A AAA ATT ATT 
A C A TCTGGTT 



CTGGGATTAC 
TCTAAACAGA 
TGTTTGTGTA 
AATTCTTGGA 
ACAGATATCC 
AAAAAGCAGC 
AGGTGAAAAA 
CAGTGATACT 
ATAAGGACAT 
CAACTAGGAT 
AGGTGTTAAT 
TACCTTCATT 
CAATCATCAA 
TTGCATGTGT 
GTTTTATACA 
TTTTGTTGTT 
GAGTGCAGTG 
AAGTGATTCT 
AGTTGTGACC 
CTCATACTCT 
TTCTCTAGAT 
ACTAATTCTA 
GGGAACTCTG 
TTGAATGGAT 
ATTTCAGTCT 
AATTTGAATT 
TGAAAATGGA 
TGAGGATCTC 
ATTTTTAATT 
CAT AAA AG AA 
TTAGGATGCA 
TTTGTAATTG 
TTAACCTCTT 
TATGCCATAA 
TTTTTCTAAC 
ATGGATTACT 
ATAAATTTAT 
GACAAAGTCA 
TATGTATGAA 
GCAAAAGGAT 
ATTTCAAAAT 
TTACTCTAAA 
TACCTTTATA 
TACCATGTGA 
AGAGTAAGAA 
CACTTTGGTT 
TTCAATGGAG 
AAATATTTTA 
ATGACTATTC 
TGGCTCATGC 
ACTTCAGCCC 
TCTACAAAAT 
CCCAGCTACT 
TCATGGCTGC 
CAGAGTGAGA 
ACAAAATTCT 
GTGTTCCAGG 
TTTCTCCAAA 
TGCTCTTTTA 
ATATAACATG 
CTTCCCAAGA 
CTTTTTTTTT 
GGCTAGAGTG 
GGTTCAAGC A 
CATGCATCAC 
TTCACCATGT 
CCTGCCTTAG 
CCTGCCCCTA 
ACTGGTGAAA 
GT AT AT ATT A 
CTAGATAAGT 
CTCGGAGAGT 
ATTATTTTTG 
GGTGCGGTCA 
TTCCTCCTCA 
CCTGGCTAAT 
CAGGCTGGTC 
AGTGCTGGG - 
AGGGTCCTGC 
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ACTTTGGCCA CCGTGCCTGG 34 800 

CAGAAGTTTT AC TTT AAAAA 34850 

TGTGTGTGTG TCTAAAAGTT 34900 

TGAACAATAA CCAAGAATAC 3 4 950 

CCTACAGGCT ATGGCCTTTT 35000 

AAGC ACGAT A CTGCTCTCAG 3 5050 

AAAAAAAAAG ATGAATCCTA 3 5100 

CTTTGTAGGA TAACTATTTG 3 5150 

AAGTAAAGAT CTATTTTTGT 35200 

TATTGGCTAT CTCTTCTGTT 3 52 50 

GGCAAGCGTG CAAGGTTCAA 3 5300 

GCACAAACAC TGACAAGTAA 35350 

GTTTTAGTGG GTAAGCCTGT 354 00 

CTATACATTG CATAAGTATA 35450 

AC GAT T T TAT TTTATTTTAT 3 5 500 

GTTGTTTTTT GAGACGGGGC 3 5550 

GTGCAATCTC AGCTCACTGC 35600 

TGAAGAGGAG AACAATAATA 35 650 

GCAGTTTCTG GAGTTGAGAA 3 5700 

TGCTTTAGGT AGCAAAAAAT 3 5750 

AGGTTTCAAT CTATCATTCC 35800 

GCCAAAAAAG ACCAGCTACC 3 5850 

GATAGTGAGG ACCCCAGTAG 3 5 900 

AAATTCATAA AAAATGTCAG 35 950 

TTTTATAAGG CTAGGAAAAG 36000 

CACATGAACC CACAAAATTT 3 6050 

TAGTCTGGCT GGCCTCTTAA 3 6100 

AGTGTGCTCT AGCCCAGACA 3 6150 

GTGTTTTCAA AATAGGAGCA 3 6200 

GGATACTAAG AGGGCCCACT 3 6250 

TTATGGATTG TCATTATGGA 3 6300 

AGTTTTTAAT TGCTTAAATT 3 6350 

TGCTTTTAGT CCAAGGTATA 3 64 00 

ACCTCCATAA. TGTCACCAAG 3 64 50 

AAGCAAGTGG -ftTAAATACCT 3 6500 

TTCCAAGTAA GTAATTTTCC 36550 

TGGTGTTTAT CAGAATAGAG 3 6600 

ACTATATCAA GTTCTAATAA 3 6650 

TACTTACTAA TATGAGTATA 3 67 00 

GTGAACACAA ACTAGCAGTT 3 67 50 

AACTTGACAT ATCAAGATCC 3 6800 

AAGACATAAT TCTTGGTAAC 3 6850 

TAATTGCTAT CAAAGGTATG 3 6900 

GATCAGTGTG ATTCCTTTAC 3 6950 

AAAGAATAGC TAGAGTATAT 3 7 000 

TCAAAAACCA ATTATTGACT 37 050 

TGCCAAAAAA TGACTATGAG 37100 

AGGTTTCTGT TCAATGTATG 37150 

TCATATTGGA GCATAAAAAG 37 2 00 

CTGTAATCCC AATACTTTGG 37250 

AGGAGTTTGA GACCAGCCTG 37 300 

ATACCAAAAT TAGCCAGGCG 37 350 

TGGGAAGCTG AGGTGGGAGG 37 4 00 

AGTGAGCTGT GATGGTGCCT 37 4 50 

CCCTGTCTCA AAAAAATAAA 37 500 

CACCATCTCC TCCCATCTTT 37550 

ACTATTAGCC TCGGAACCTG 37 600 

GTCCTGTCCC CC AC AG AC A A 37 650 

TTTTTTCCCC TCTATGCTTT 37700 

TAATAGGAAA AAGGCAGGGT 37 7 50 

GCCTTTCCAA CATAGCCTCT 37 8 00 

TTTTTTTTTT TTCTGAGACA 37 850 

CAGTGGCGTG ATCTAGGCTC 37 900 

ATTCTCCCAC CTCAGCCTCC 37 950 

CACGCCTGGC TAATTTTTGT 3 8 000 

GGGCCAGGCT GGTCTTGAAC 38050 

CCTCCCAAAG TGCTAGGATT 38100 

TTACATTCTG ATCACACATT 38150 

TTATAGACAA TGTTTTGTTC 38200 

CTTACACTCT TCTGTCTTTA 38250 

TTATGAGTGC ACAGTCTGTA 38 300 

TAAACAACAG TCACCTTTAA 38 350 

AGGCGGGGGT CTCATTCTGT 384 00 

CAGCTCACTG CAGCCACCGC 3 8450 

GCCTTCTGAG TAGCTGAGAC 38 500 

TTTTTAACTT TTTGTAGAGA 38550 

TCAAACTCCT AAGCTCAAGT 38 600 

<T>rr>r\/~7\ r-_r-_r~ II^ 1 2 ^ •l^' n C^ T O f C ^ 

AT AGT AAG AC T T T AAT AAAT 3 6 700 



ACTATATTGC 

GCCTTAGCCG 

GGCTGAGTGA 

ATACATTTTG 

TCCTTCCAGC 

TGTCGTCATG 

TTAAACCCCA 

TTAAGCTTAC 

TGATTAAGCA 

CTTATCTCCA 

TGTAGCAAAA 

CTTGCAAGTT 

GGAGATATTT 

TGGCATTTCC 

AAACAAAACC 

AACCTCTGCT 

ACACAGGGCT 

TCACTGATGC 

AAATATATAA 

TAAATAAACT 

TATGTAGTGG 

GGGTGGGGGG 

AG T A AAA A A A 

GGATACATTC 

ATCATAGAGT 

TAGGCTATAT 

GTTACTGTAG 

TGTTAAGTAG 

GTTGTGCTAC 

ATAATTTTAT 

ACCAAAACAT 

AGATGAAAGA 

TAGGTTACTT 

TTATAGTGTT 

TACATGTATT 

TATACGTTCT 

GCTCTACCAG 

GCCTCCCAGG 

ACTACAGGCA 

AGATGGAGTT 

AGTGATCTGC 

CCACTGTGCT 

ACTTTGTTGA 

TATTTAGAAT 

CTTCATAAGC 

TTGATTTAAA 

TAAGAACGTT 

AAAATATAAA 

GCATTCTACC 

CTGCAGGGAG 

AATTTCATTG 

CTATATATCT 

AATAAAATAG 

CCACCATCAT 

ATGGGCCAGA 

CATAACTACT 

GCCTAAGTGA 

AAAAAAAAAT 

ATGGGTGGGG 

CAGATCTGTC 

CCTTGCCACC 

TTGCCAGCTT 

TGCTTGCATC 

GTATACTAAG 

TGCAAAGCAA 

AGATTTAGCA 

ATAGCTAATA 

ACTTTTATCT 

TTGAGACAGA 

CTTGGCTCAC 

GCTGGGATTA 

TGTAGAGATG 

CCTCGAACTC 

CTGGGATTAC 

ACAGGGTATC 

ATCACTGCAG 

TCCCAAGTAG 



CCAAGCTGGT 

CCCAAAGTGC 

ACATATTTTT 

CCCAGCATCC 

TTCATTTCAT 

TTATTGACTT 

CCCTCATTGC 

CCTTGATATA 

ATATAGCCTG 

GCAGGATTAA 

TATCCTCTCC 

TCTTAATTTC 

TCAAGACCTA 

CCCTTCACTC 

AACTCATATA 

ACAATCATGG 

GAGCGTCTCA 

TTAATGAGGA 

T A AT GC T AC A 

AATATACTCA 

ATGGATGTTT 

AAGAATCAAG 

A AAA AAA AG G 

CGAGAAATGT 

GAACTTACAC 

GACTAGCCTG 

CGAATATACA 

TTGTGTATCT 

AATGTTACAA 

CCTTTTATGG 

CCTTATGTGG 

ATGAATATAC 

TTATTTATCT 

TACT AT AT AA 

ACCTAAATGA 

CTTTTCTTTT 

GCTGGAGTGC 

TTCAAACGAT 

CACACCACCA 

TTGCCATGTT 

CTGCCTCAGC 

CGGCCTAATC 

CAATATAAAA 

TATGAAAATA 

TCTTGCCTAT 

TAATAAGTAT 

CAACAGTTTT 

ATTTTCTGTA 

AAAATTTCTT 

AGGGGAGTTA 

GCTACCATTT 

ATTTTCTTTA 

CCACCATTCC 

ATTGCCTATT 

CAGTAAGTAT 

CATCTCTGCC 

TATAGTGTTG 

TTATTTGGTC 

CATGCACCAC 

CAACTCAATG 

TTTAATGGAA 

TCTCATATAG 

TGAAAATAAA 

A G T AA AG C AA 

CTAGTGGGTG 

CAGTATTTTG 

ATACCTTGTT 

AAAGTTTTGT 

ATCTCTCTCT 

TGCAACTTTA 

TAGGCGTGTG 

GAGTTTCGCC 

CTGTCCTCAA 

AGGTGTGAGC 

ATTCTGTTGC 

CCTTTTAACT 

CTAGGACCAC 



CTCGAACTCC 
TGGGATTACA 
AACATAAAGG 
CCATTTCCGC 
CTGAAATTTG 
CAGAATATAA 
CCAGCCTGAT 
TGTGTAGCAT 
ATGGTATAAT 
TTCACAGTGA 
AAAAGCATAT 
ATGCAGAACA 
TTTTTGTTTG 
CATCTAAAAA 
GACTGAGTAC 
GCGTGCTATT 
TTAGGTCAAA 
CAGGGTGTGA 
TGGAAAAATA 
CACCATGGAA 
AATGGTGTGA 
TTTTAAGAAA 
TATGTACAGT 
GTCGATAGGT 
AAACCTAGAT 
TTGCTCCTAG 
AATACTTAAC 
AAACATATCT 
TGACTATGAC 
AACCACACTT 
CATATGACTG 
AT C A AAA TAT 
TAGTAATAAT 
AAGACACTGT 
TATAAATATA 

AGGGTGCAAT 
TCTCATGTCT 
TGCCCGGCTA 
GGCCAGGCTG 
CTCCCAAAGT 
TTACAAGTTT 
CATATTTGAG 
TCAATAGACC 
ATTGATTCGC 
GTATAAGAAA 
TAATTTGAAT 
GTTTAGCCAA 
AATAACAGTA 
GGCAGTTTAT 
ACGCTAAATT 
CATAAAAAAG 
AGAAGTTGTG 
ATATAGATTG 
TTCTGGCTTT 
ATTGTAGCTT 
AA AT AC A AGT 
TAAAAAAGAT 
TTGGTTAACT 
GTCTAACTCT 
AAACCTCTCC 
TTTTTTTGTG 
ATATACTAGT 
CTCAAGTTAT 
CTTGAGAGAC 
ATCTCGCTAG 
CCAAATACTG 
TTTGTTTTAT 
GTCACCCAGG 
AGCAATTCTC 
CCACCACGCC 
ATATTGGCCA 
GTGATCCACC 
CACCACACCC 
CCAGGCTTGA 
CCTGGGCTCA 
AGACACATGC 
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TGGACTCACG CAATCCTGCT 
GGCATGACCC ACCTCATCTG 
CCGTATTTTA TATTTATCTC 
CGAATCTGTT GCTTGCTAAT 
ACAAACATCT TCTATTTCTT 
AATAAAACAC TATACCCAAA 
GTGAAAATAA TCAGCATACA 
CTTTTAGATA AATATACAGC 
ATCTTGCCCA TGTACCTCAT 
TCAGATTTAC CTTTAAACTT 
CTAAAACTTT TGTGTGTACT 
GGCTCTTACC ACTGTTAGCT 
TGGTTTCCTG ATGATGGTCA 
TTGAGGTGAT ACAGGCTTTT 
AACTGCAATG CAGGCATGCT 
GATATGTCTT AAGTTACAGA 
ATGTAAACCA GTTTTTCTGC 
GAGATTTCTT TAAGGAAAAC 
TCTAACATTA GAGAATTAAG 
TCTTGTGCAG AC ATT AAA AT 
GAAAAAGTTA GGATGTGCTG 
ATACAGTATA CCCATACTTA 
CATGTGTTGC TTAATGATGG 
GATTTCATCC TTGTGTGAAC 
GGTCTAGCCT ACTATGTATC 
GCTACAAACC TGTAAAGCAT 
ACAATGGCAA GCTATCATTG 
AAAACATAGA AAACTAATGT 
ATTGCTAGGC AATAGGAATT 
ATATATGCGG TCCATGGTGG 
TATACATGTA CACAAAAAAT 
TTAAAATGGT TATAATGACT 
AATGATGATA GATAATACTT 
TATAAGTGTT CTACATACTT 
ACTCTGACAG TAACTAATCT 
CTTTTTTTAG AC AGAATC TT 
CTCGGCTCAC TGCAACCTCC 
CAGCCTCCTG AGTAGCTGGG 
ATTTTTGTAT TTTTGGGTAG 
ATCTTGAACT CCTGGCCTCA 
GCTGGGATTA CAGGTGTGAA 
TCAATATTTA AAGAGTGCTA 
AAAAAGAGAT ATAAGCATCT 
TACAGCCGAC TAAAGCTTTT 
TCCTGTGAAT ATGCATTAAT 
TAACACTTTT CCTTAATTTT 
TCCAATAGTG AAATACATAG 
ATTGTTTTTG TTTCACCACA 
AGAAAATGAA TGCATACCTC 
GGGCATAGTT ACAAGTGAGA 
CATAAAAACT GCATTCAATT 
GTTTCAATTA TTGGCCATTA 
TCATGTTTAT CCTTTTTATA 
TGTGTGTTCC ATTTTCTGTA 
GGAGTCCATA TGGTCTCTAT 
AAAGATTATC TAGGTCAAAT 
TATATAATAT AGGCTGCCAC 
TTCATGACTT TTGTAGCAGC 
CGGTGTATCT TTCTCCTTTG 
AAAGATGGTG GATGATCAAA 
GGCCAGGAAG TTCACTGGGC 
ATAAGAAATG CCAAAGTTGC 
CCTGACACTG AATTTTTCAA 
AGGAAAGGAA GCAGATACCT 
ACTGGGACAC TGTCAGTGCT 
GTAGAACACT GCTAATAATA 
CTTAGCATTT TGCATGTTTT 
TATTTATTTA TTTATTTATT 
CTGGAGTGCC ATGGTGCGAT 
CTGCCTCAGC TTCCTGAGTA 
CAGCTACTTT CTATATTTTT 
AGCTGGTCTC GAACTCCTGT 
CGCCTCAGCC TCTCAAAGTG 
AGCAGTGTTT TATTTTTGAG 
GTGCAGTGGT GCAATCATAG 
AGTCATCCTC CTGCTTAGCC 
CATCACACTT GGCTATTTTT 
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38800 

38850 

38900 

38950 

39000 

39050 

39100 

39150 

39200 

39250 

39300 

39350 

39400 

39450 

39500 

39550 

39600 

39650 

39700 

39750 

39800 

39850 

39900 

39950 

40000 

40050 

40100 

40150 

40200 

40250 

40300 

40350 

40400 

40450 

40500 

40550 

40600 

40650 

40700 

40750 

40800 

40850 

40900 

40950 

41000 

41050 

41100 

41150 

41200 

41250 

41300 

41350 

41400 

41450 

41500 

41550 

41600 

41650 

41700 

41750 

41800 

41850 

41900 

41950 

42000 

42050 

42100 

42150 

42200 

42250 

42300 

42350 

42400 

42450 

42500 

42550 

42600 

4 2 650 

42" 7 00 

4 2 7 50 



CTCTGTGCAG TGTTGCTAGT CAGCGAAAGA 
GCGATTAGCC ACCACAACCA GTCTTTATTT 
GGCGCAGTGG CTCACACCTG TAATCCTAGC 
ATGGATCACC TGACGTGAGG AATTTGAGAC 
AACCCCATCT CTACTAAAAA ATACAAAAAT 
GTCCCAGCTA CTTGGGAGGC TGGGGCAGGA 
GCAGAGGTTG CAGTGAGCCG AGATTGTGCC 
ACAGAGAGAG ATTCCATCTC AAAAAAACAA 
ATGCTCCTAA TATGGTCAGG AAGCAAGGAA 
TTTAAGAAGG TGCTTAGCTG TATATTTATC 
TTTTAGAATT CTTTCCTTCA TGTGCCATCT 
AAGCATACTG CCGTTACCGT GAAACTGGTT 
TTGCACCTTA AAAGACAGCT AGATTTTGCT 
CTTTGTCAGC AATAATATGT GAGAGGACAG 
AAAAAATGGT TAATGACAAT TCAGAGGCGA 
ATTACTATAA ATGAAATTGA TTTGTCAAGA 
CCAATACCTT ATAACTGTCT GTTAATGCTT 
CCTTGTTTCA GTTGGGAAGC TTTTGGCTGC 
TCAAATGGCT TAAGCAATAA GGAAATGTAT 
TCAAACAGGC CAGGCTCCAG CACTTCAGTA 
CTTCCCAGCT CTCTGCTCTG CCATCTTTAG 
TCTGGTAGCA TGATGGCTGT AGCTGTTTCA 
AGCAACCAGA GGAAGAAAAT GAGCCATTTT 
TGAATAACTC TTTTTCAGAG CTTCTCACAG 
CTCATGTCTT ATTGTTCAGA AATGGGTAAT 
TGCCAACAAC AACGAGGTTC CTATAATTGT 
TGGAGAGGGT GTTGGTCAGT CTACAAACTG 
TTTACCAGTG AAAAAATGTA ATTATTTTCC 
TTCAAATGTA TGCCTGTTAT GGATATAGTA 
AATAGCTTTA GGGGTACACA CTTTTTGCTT 
GGTGAAGACT CGGCTTTTAA TGTACTTGTC 
ACCCAATAGG TAATTTTTCA TCCATTACCC 
CTGAGTCTCC AACATCCCTT ATACCACTGT 
AGCTAAGCTT CCACTTATAA GTGAGAACAT 
CCTGAGTTAC TTCCCTTAGG ATAACAGCCC 
GCAAAATACA TTATTCTTCT TTATGGCTGA 
TATACCACAT TTTCTTTATC CACTTATCAG 
TTCCATTCAA TTTCATTCAA TTTAAGTATA 
AAAATTAAAT TTTAGATCTT TCAATACTCT 
TTTTTATATT TTCACATTTG AAATAAAGTA 
GTATGACTAT TCTTTTAGTA ATGTAAAGCC 
ACCACTAGTG TGTTGTTTCA CCCCTTGTTA 



155 



CTATAATACC 


TGTGGGGACA 


42800 


AAAGTTATTA 


AAAATGGCTG 


42850 


ACTTTGGGAG 


GCCGAGGCAG 


42900 


CAGCCTGGCC 


AACATGGTGA 


42950 


TAGCTGGGTG 


TGGTCCTGTA 


43000 


GAATTACTTG 


AACCCAGGAG 


43050 


ACTGCACTCC 


AGCCTGGGTG 


43100 


GTTATTAAAA 


ATGTATATGA 


43150 


GCGAAGGATA 


TATTATGAGT 


43200 


TTTCAAAATG 


TATTAGAAGA 


4 3250 


CTACAGGCAC 


CCATCAGAAA 


43300 


GTAAAAGAGA 


AACTATCTAT 


43350 


GATTTTCTTC 


TTTCGGTTTT 


43400 


ATTGTTAGAT 


ATGATAGTAT 


43450 


GGAGATTCTG 


TAAACTTAAA 


43500 


GGATAAATTT 


TAGAAAACAC 


43550 


GCTTTTTCTC 


TACCTTTCTT 


43600 


AAGTAACAGA 


AACTCCTAAT 


4 3650 


ATTCCCACAT 


AACTAGACGT 


43700 


CGTCACCAGG 


GATCTGGGTT 


43750 


CGCTGGCTTC 


ATTCTCAGAC 


43800 


TGGGCCCCTT 


CAAACCTCAT 


43850 


TTGAGTCTCC 


TTCATAGACT 


43900 


CAAACCTCTC 


CTCATGTCTC 


43950 


GTGGCCATTT 


CACCAGTCAC 


44000 


CTCTGAGTAA 


CCCTTTGGAA 


44050 


AACACTGCAG 


TTCTGCGCTT 


44100 


CCTCTTAAGG 


ATTAATATTC 


44150 


TCTTTAAAAT 


TTTTTATTTT 


44200 


ACAGGGGTGA 


ATTGTGTAGT 


44250 


ACCTGAGTGA 


TGTACATTGT 


44300 


TCCTTCCGCC 


CTCTTCCCTT 


44350 


GTATGTTCTT 


GTGTACCTAC 


44400 


GCAGTATTTG 


GTTTTCCATT 


44450 


CCAGTTCCGT 


CCAAGTTGCT 


44500 


GTAATAGTCC 


ATGGTACATA 


44550 


TTGATGGACA 


CTTAGGTTAA 


44600 


TTTGTAAGGA 


GCTAAAGCTG 


44650 


TAAATTTTAT 


ATGTAAGTGG 


44700 


ATTTTTATAA 


CCTTGATATT 


44750 


TACAGACTCC 


TACATTTGGA 


44800 


TACTATCAGG 


ATCCTCGA 


44848 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 





(A) 


LENGTH : 


2396 








(B) 


TYPE: 


nucleic acid 






(C) 


STRANDEDNESS : double 








(D) 


TOPOLOGY: 


linear 






<xi) SEQUENCE DESCRIPTION: SEQ ID 


NO: 43 




TTTCTAGTTG 


CTTTTAGCCA 


ATGTCGGATC 


AGGTTTTTCA 


AGCGACAAAG 


50 


AGATACTGAG 


ATCCTGGGCA 


GAGGACATCC 


TAGCTCGGTC 


AGATTTGGGC 


100 


AGGCTCAAGT 


GACCAGTGTC 


TTAAGGCAGA 


AGGGAGTCGG 


GGTAGGGTCT 


150 


GGCTGAACCC 


TCAACCGGGG 


CTTTTAACTC 


AGGGTCTAGT 


CCTGGCGCCA 


200 


AATGGATGGG 


ACCTAGAAAA 


GGTGACAGAG 


TGCGCAGGAC 


ACCAGGAAGC 


250 


TGGTCCCACC 


CCTGCGCGGC 


TCCCGGGCGC 


TCCCTCCCCA 


GGCCTCCGAG 


300 


GATCTTGGAT 


TCTGGCCACC 


TCCGCACCCT 


TTGGATGGGT 


GTGGATGATT 


350 


TCAAAAGTGG 


ACGTGACCGC 


GGCGGAGGGG 


AAAGCCAGCA 


CGGAAATGAA 


400 


AGAGAGCGAG 


GAGGGGAGGG 


CGGGGAGGGG 


AGGGCGCTAG 


GGAGGGACTC 


4 50 


CCGGGAGGGG 


TGGGAGGGAT 


GGAGCGCTGT 


GGGAGGGTAC 


TGAGTCCTGG 


500 


CGCCAGAGGC 


GAAGCAGGAC 


CGGTTGCAGG 


GGGCTTGAGC 


CAGCGCGCCG 


550 


GCTGCCCCAG 


CTCTCCCGGC 


AGCGGGCGGT 


CCAGCCAGGT 


GGGATGCTGA 


600 


GGCTGCTGCT 


GCTGTGGCTC 


TGGGGGCCGC 


TCGGTGCCCT 


GGCCCAGGGC 


650 


GCCCCCGCGG 


GGACCGCGCC 


GACCGACGAC 


GTGGTAGACT 


TGGAGTTTTA 


700 


CACCAAGCGG 


CCGCTCCGAA 


GCGTGAGTCC 


CTCGTTCCTG 


TCCATCACCA 


750 


TCGACGCCAG 


CCTGGCCACC 


GACCCGCGCT 


TCCTCACCTT 


CCTGGGCTCT 


800 


CCAAGGCTCC 


GTGCTCTGGC 


TAGAGGCTTA 


TCTCCTGCAT 


ACTTGAGATT 


850 


TGGCGGCACA 


AAGACTGACT 


TCCTTATTTT 


TGATCCGGAC 


AAGG A AC C GA 


900 


\- - T <~ C G 'o . 


AAGAAGTT AC 


TGG AAAT CTC 


G T C AA C C A 


TG AT ATT T GC 


950 



AGGTCTGAGC CGGTCTCTGC TGCGGTGTTG 
GCCCTTCCAG GAGCTGTTGC TGCTCCGAGA 
AGAACAGCAC CTACTCAAGA AGCTCAGTGG 
AAGTGCTCGG GGTTAGACCT GATCTTTGGT 
CCCAGACTTA CGGTGGAACA GcTCCAACGC 
GCTCTTCCAA GGGTTATAAC ATcTCCTGGG 
AGTTTcTGGA AGAAAGCTCA CATTCTCATC 
AGACTTTGTG GAGTTGCATA AACTTcTACA 
CAAAACTCTA TGGTCCTGAC ATCGGTCAGC 
CTGCTGAGGA GTTTCCTGAA GGCTGGCGGA 
ATGGCATCAC TATTACTTGA ATGGACGCAT 
TGAGCTCTGA TGCGCTGGAC ACTTTTATTC 
AAGGTCACTA AAGAGATCAC ACCTGGCAAG 
GAGCTCAGCT TACGGTGGCG GTGCACCCTT 
CTGGCTTTAT GTGGCTGGAT AAATTGGGCC 
GAAGTCGTGA TGAGGCAGGT GTTCTTCGGA 
GGATGAAAAC TTTGAGCCTT TACCTGATTA 
AGAAACTGGT AGGTCCCAGG GTGTTACTGT 
AGGAGCAAAC TCCGAGTGTA TCTCCACTGC 
ATATCAGGAA GGAGATCTAA CTCTGTATGT 
CCAAGCACTT GAAGGTACCG CCTCCGTTGT 
TACCTTCTGA AGCCTTCGGG GCCGGATGGA 
ACTGAACGGT CAAATTCTGA AGATGGTGGA 
TGACAGAAAA ACCTCTCCCC GCAGGAAGTG 
TCCTATGGTT TTTTTGTCAT AAGAAATGCC 
AAAATAAAAG GCATACGGTA CCCCTGAGAC 
TTCATAAAAC AAAACCCTAG TTTAGGAGGC 
GAGCTTCGGG AGGGTGGGGT ACACTTCAGT 
CTCTCTAAGA AGAATACTGC AGGTGGTGAC 



156 

AGGAAACTCC AGGTGGAATG 1000 

GCAGTACCAA AAGGAGTTCA 1050 

ACATGCTCTA CAGTTTTGCC 1100 

CTAAATGCGT TACTACGAAC 1150 

CCAGCTTCTC CTTGACTACT 1200 

AACTGGGCAA TGAGCCCAAC 12 50 

GATGGGTTGC AGTTAGGAGA 1300 

AAGGTCAGCT TTCCAAAATG 1350 

CTCGAGGGAA GACAGTTAAA 14 00 

GAAGTGATCG ACTCTCTTAC 14 50 

CGCTACCAAA GAAGATTTTC 1500 

TCTCTGTGCA AAAAATTCTG 1550 

AAGGTCTGGT TGGGAGAGAC 1600 

GCTGTCCAAC ACCTTTGCAG 1650 

TGTCAGCCCA GATGGGCATA 17 00 

GCAGGCAACT ACCACTTAGT 1750 

CTGGCTCTCT CTTCTGTTCA 1800 

CAAGAGTGAA AGGCCCAGAC 1850 

ACTAACGTCT ATCACCCACG 1900 

CCTGAACCTC CATAATGTCA 1950 

TCAGGAAACC AGTGGATACG 2 000 

TTACTTTCCA AATCTGTCCA 2050 

TGAGCAGACC CTGCCAGCTT 2100 

CACTAAGCCT GCCTGCCTTT 2150 

AAAATCGCTG CTTGTATATG 2200 

AAAAGCCGAG GGGGGTGTTA 2250 

CACCTCCTTG CCGAGTTCCA 2300 

ATTACATTCA GTGTGGTGTT 2350 

AGTTAATAGC ACTGTG 2396 



(2) INFORMATION FOR SEQ ID NO : 4 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 4 

Met Leu Arg Leu Leu Leu Leu Trp Leu Trp Gly Pro Leu Gly Ala 
5 10 15 

Leu Ala Gin Gly Ala Pro Ala Gly Thr Ala Pro Thr Asp Asp Val 
20 25 30 

Val Asp Leu Glu Phe Tyr Thr Lys Arg Pro Leu Arg Ser Val Ser 
35 40 45 

Pro Ser Phe Leu Ser He Thr He Asp Ala Ser Leu Ala Thr Asp 
50 55 60 

Pro Arg Phe Leu Thr Phe Leu Gly Ser Pro Arg Leu Arg Ala Leu 
65 70 75 

Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys 
80 85 90 

Thr Asp Phe Leu He Phe Asp Pro Asp Lys Glu Pro Thr Ser Glu 
95 100 105 

Glu Arg Ser Tyr Trp Lys Ser Gin Val Asn His Asp He Cys Arg 
HO 115 120 

Ser Glu Pro Val Ser Ala Ala Val Leu Arg Lys Leu Gin Val Glu 
125 130 135 

Trp Fro Phe Gin Glu Leu Leu Leu Leu Arg Glu Gin Tyr Gin Lys 
14C 145 15G 
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Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 
155 160 165 

Tyr Ser Phe Ala Lys Cys Ser Gly Leu Asp Leu lie Phe Gly Leu 
170 175 180 

Asn Ala Leu Leu Arg Thr Pro Asp Leu Arg Trp Asn Ser Ser Asn 

185 190 195 

Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn lie 
200 205 210 

Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 

215 220 225 

His He Leu He Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 

230 235 240 

Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 

245 250 255 

Tyr Gly Pro Asp He Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 

260 265 270 

Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Leu 

275 280 285 

Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 

290 295 300 

Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe He Leu Ser Val 

305 310 315 

Gin Lys He Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 

320 325 330 

Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 

335 340 345 

Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 

350 355 360 

Leu Gly Leu Ser Ala Gin Met Gly He Glu Val val Met Arg Gin 

365 370 375 

Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 

380 385 390 

Glu Pro Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu 

395 400 405 

Val Gly Pro Arg Val Leu Leu Ser Arg Val Lys Gly Pro Asp Arg 

410 415 420 

Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 

425 430 435 

Arg Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 

440 445 450 

Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 

455 460 465 

Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 

470 475 480 

Leu Ser Lys Ser Val Gin Leu Asn Gly Gin He Leu Lys Met Val 

485 490 495 

Asp Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 

500 505 510 

Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 

515 520 525 

He Arg Asn Ala Lys He Ala Ala Cys He 

530 535 

(2) INFORMATION FOR SEQ ID NO : 4 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 396 

(5; TYPE : nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 5 



TT TCT AGT 8 



TGC 


TTT 


TAG 


CCA 


ATG 


TCG 


GAT 


CAG 


GTT 


TTT 


CAA 


GCG 


ACA 


AAG 


AGA 


53 


TAC 


TGA 


GAT 


CCT 


GGG 


CAG 


AGG 


ACA 


TCC 


TAG 


CTC 


GGT 


CAG 


ATT 


TGG 


98 


GCA 


GGC 


TCA 


AGT 


GAC 


CAG 


TGT 


CTT 


AAG 


GCA 


GAA 


GGG 


AGT 


CGG 


GGT 


143 


AGG 


GTC 


TGG 


CTG 


AAC 


CCT 


CAA 


CCG 


GGG 


CTT 


TTA 


ACT 


CAG 


GGT 


CTA 


188 


GTC 


CTG 


GCG 


CCA 


AAT 


GGA 


TGG 


GAC 


CTA 


GAA 


AAG 


GTG 


ACA 


GAG 


TGC 


233 


GCA 


GGA 


CAC 


CAG 


GAA 


GCT 


GGT 


CCC 


ACC 


CCT 


GCG 


CGG 


CTC 


CCG 


GGC 


278 


GCT 


CCC 


TCC 


CCA 


GGC 


CTC 


CGA 


GGA 


TCT 


TGG 


ATT 


CTG 


GCC 


ACC 


TCC 


323 


GCA 


CCC 


TTT 


GGA 


TGG 


GTG 


TGG 


ATG 


ATT 


TCA 


AAA 


GTG 


GAC 


GTG 


ACC 


368 


GCG 


GCG 


GAG 


GGG 


AAA 


GCC 


AGC 


ACG 


GAA 


ATG 


AAA 


GAG 


AGC 


GAG 


GAG 


413 


GGG 


AGG 


GCG 


GGG 


AGG 


GGA 


GGG 


CGC 


TAG 


GGA 


GGG 


ACT 


CCC 


GGG 


AGG 


458 


GGT 


GGG 


AGG 


GAT 


GGA 


GCG 


CTG 


TGG 


GAG 


GGT 


ACT 


GAG 


TCC 


TGG 


CGC 


503 


CAG 


AGG 


CGA 


AGC 


AGG 


ACC 


GGT 


TGC 


AGG 


GGG 


CTT 


GAG 


CCA 


GCG 


CGC 


548 


CGG 


CTG 


CCC 


CAG 


CTC 


TCC 


CGG 


CAG 


CGG 


GCG 


GTC 


CAG 


CCA 


GGT 


GGG 


593 


ATG 


CTG 


AGG 


CTG 


CTG 


CTG 


CTG 


TGG 


CTC 


TGG 


GGG 


CCG 


CTC 


GGT 


GCC 


638 


Met 


Leu 


Arg 


Leu 


Leu 


Leu 


Leu 


Trp 


Leu 


Trp 


Gly 


Pro 


Leu 


Gly 


Ala 












5 










10 










15 




CTG 


GCC 


CAG 


GGC 


GCC 


CCC 


GCG 


GGG 


ACC 


GCG 


CCG 


ACC 


GAC 


GAC 


GTG 


683 


Leu 


Ala 


Gin 


Gly 


Ala 


Pro 


Ala 


Gly 


Thr 


Ala 


Pro 


Thr 


Asp 


Asp 


val 












20 










25 










30 




GTA 


GAC 


TTG 


GAG 


TTT 


TAC 


ACC 


AAG 


CGG 


CCG 


CTC 


CGA 


AGC 


GTG 


AGT 


728 


Val 


Asp 


Leu 


Glu 


Phe 


Tyr 


Thr 


Lys 


Arg 


Pro 


Leu 


Arg 


Ser 


Val 


Ser 












35 










40 










45 




CCC 


TCG 


TTC 


CTG 


TCC 


ATC 


ACC 


ATC 


GAC 


GCC 


AGC 


CTG 


GCC 


ACC 


GAC 


773 


Pro 


Ser 


Phe 


Leu 


Ser 


He 


Thr 


He 


Asp 


Ala 


Ser 


Leu 


Ala 


Thr 


Asp 












50 










55 










60 




CCG 


CGC 


TTC 


CTC 


ACC 


TTC 


CTG 


GGC 


TCT 


CCA 


AGG 


CTC 


CGT 


GCT 


CTG 


818 


Pro 


Arg 


Phe 


Leu 


Thr 


Phe 


Leu 


Gly 


Ser 


Pro 


Arg 


Leu 


Arg 


Ala 


Leu 












65 










70 










75 




GCT 


AGA 


GGC 


TTA 


TCT 


CCT 


GCA 


TAC 


TTG 


AGA 


TTT 


GGC 


GGC 


ACA 


AAG 


863 


Ala 


Arg 


Gly 


Leu 


Ser 


Pro 


Ala 


Tyr 


Leu 


Arg 


Phe 


Gly 


Gly 


Thr 


Lys 












SO 










85 










90 




ACT 


GAC 


TTC 


CTT 


ATT 


TTT 


GAT 


CCG 


GAC 


AAG 


GAA 


CCG 


ACT 


TCC 


GAA 


908 


Thr 


Asp 


Phe 


Leu 


He 


Phe 


Asp 


Pro 


Asp 


Lys 


Glu 


Pro 


Thr 


Ser 


Glu 












95 










100 










105 




GAA 


AGA 


AGT 


TAC 


TGG 


AAA 


TCT 


CAA 


GTC 


AAC 


CAT 


GAT 


ATT 


TGC 


AGG 


953 


Glu 


Arg 


Ser 


Tyr 


Trp 


Lys 


Ser 


Gin 


Val 


Asn 


His 


Asp 


He 


Cys 


Arg 












110 










115 










120 




TCT 


GAG 


CCG 


GTC 


TCT 


GCT 


GCG 


GTG 


TTG 


AGG 


AAA 


CTC 


CAG 


GTG 


GAA 


998 


Ser 


Glu 


Pro 


Val 


Ser 


Ala 


Ala 


Val 


Leu 


Arg 


Lys 


Leu 


Gin 


Val 


Glu 












125 










130 










135 




TGG 


CCC 


TTC 


CAG 


GAG 


CTG 


TTG 


CTG 


CTC 


CGA 


GAG 


CAG 


TAC 


CAA 


AAG 


1043 


Trp 


Pro 


Fhe 


Gin 


Glu 


Leu 


Leu 


Leu 


Leu 


Arg 


Glu 


Gin 


Tyr 


Gin 


Lys 












14 0 










1-5 5 










150 





159 



GAG TTC AAG AAC AGC ACC TAC TCA AGA AGC TCA GTG GAC ATG CTC 1088 
Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 
155 160 165 

TAC AGT TTT GCC AAG TGC TCG GGG TTA GAC CTG ATC TTT GGT CTA 1133 
Tyr Ser Phe Ala Lys Cys Ser Gly Leu Asp Leu lie Phe Gly Leu 
170 175 180 

AAT GCG TTA CTA CGA ACC CCA GAC TTA CGG TGG AAC AGC TCC AAC 117 8 
Asn Ala Leu Leu Arg Thr Pro Asp Leu Arg Trp Asn Ser Ser Asn 
185 190 195 

GCC CAG CTT CTC CTT GAC TAC TGC TCT TCC AAG GGT TAT AAC ATc 1223 
Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn lie 
200 205 210 

TCC TGG GAA CTG GGC AAT GAG CCC AAC AGT TTc TGG AAG AAA GCT 1268 

0 

Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 
215 220 225 

CAC ATT CTC ATC GAT GGG TTG CAG TTA GGA GAA GAC TTT GTG GAG 1313 
His He Leu He Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 
230 235 240 

TTG CAT AAA CTT cTA CAA AGG TCA GCT TTC CAA AAT GCA AAA CTC 1358 
Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 
245 250 255 

TAT GGT CCT GAC ATC GGT CAG CCT CGA GGG AAG ACA GTT AAA CTG 14 03 
Tyr Gly Pro Asp lie Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 
260 265 270 

CTG AGG AGT TTC CTG AAG GCT GGC GGA GAA GTG ATC GAC TCT CTT 14 4 8 
Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Leu 
275 280 285 

ACA TGG CAT CAC TAT TAC TTG AAT GGA CGC ATC GCT ACC AAA GAA 14 93 
Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 
290 295 300 

GAT TTT CTG AGC TCT GAT GCG CTG GAC ACT TTT ATT CTC TCT GTG 1538 
Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe He Leu Ser Val 
305 310 315 

CAA AAA ATT CTG AAG GTC ACT AAA GAG ATC ACA CCT GGC AAG AAG 158 3 
Gin Lys He Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 
320 325 330 

GTC TGG TTG GGA GAG ACG AGC TCA GCT TAC GGT GGC GGT GCA CCC 1628 
Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 
335 340 345 

TTG CTG TCC AAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 167 3 
Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
350 355 360 



160 

TTG GGC CTG TCA GCC CAG ATG GGC ATA GAA GTC GTG ATG AGG CAG 
Leu Gly Leu Ser Ala Gin Met Gly lie Glu Val Val Met Arg Gin 
365 370 375 



1718 



GTG TTC TTC GGA GCA GGC AAC TAC CAC TTA GTG GAT GAA AAC TTT 
Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 
380 385 390 ' 



1763 



GAG CCT TTA CCT GAT TAC TGG CTC TCT CTT CTG TTC AAG AAA CTG 
Glu Pro Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu 
395 400 405 



1808 



GTA GGT CCC AGG GTG TTA CTG TCA AGA GTG AAA GGC CCA GAC AGG 
Val Gly Pro Arg Val Leu Leu Ser Arg Val Lys Gly Pro Asp Arg 
410 415 420 



1853 



AGC AAA CTC CGA GTG TAT CTC CAC TGC ACT AAC GTC TAT CAC CCA 
Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 
425 430 435 



1898 



CGA TAT CAG GAA GGA GAT CTA ACT CTG TAT GTC CTG AAC CTC CAT 
Arg Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 
440 445 450 



1943 



AAT GTC ACC AAG CAC TTG AAG GTA CCG CCT CCG TTG TTC AGG AAA 
Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 
455 460 465 



1988 



CCA GTG GAT ACG TAC CTT CTG AAG CCT TCG GGG CCG GAT GGA TTA 
Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 
470 475 480 



2033 



CTT TCC AAA TCT GTC CAA CTG AAC GGT CAA ATT CTG AAG ATG GTG 
Leu Ser Lys Ser Val Gin Leu Asn Gly Gin lie Leu Lys Met Val 
485 490- 495 



2078 



GAT GAG CAG ACC CTG CCA GCT TTG ACA GAA AAA CCT CTC CCC GCA 
Asp Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 
500 505 510 



2123 



GGA AGT GCA CTA AGC CTG CCT GCC TTT TCC TAT GGT TTT TTT GTC 
Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 
515 520 525 



2168 



ATA AGA AAT GCC AAA ATC GCT GCT TGT ATA TGA AAA TAA AAG GCA 
lie Arg Asn Ala Lys lie Ala Ala Cys lie 
530 535 



2213 



TAC GGT ACC CCT GAG ACA AAA GCC GAG GGG GGT GTT ATT CAT AAA 22 58 

ACA AAA CCC TAG TTT AGG AGG CCA CCT CCT TGC CGA GTT CCA GAG 2303 

CTT CGG GAG GGT GGG GTA CAC TTC AGT ATT ACA TTC AGT GTG GTG 2348 

TTC TCT CTA AGA AGA ATA CTG CAG GTG GTG ACA GTT AAT AGC ACT 23 93 

GTG 2396 



(2) INFORMATION FOR SEQ ID NO : 4 6 : 

( i ) SEQUENCE CHARACTERI STI CS : 

(A) LENGTH: 385 



(B) 
(C) 
(D) 



TYPE: 

STRANDEDNESS: 
TOPOLOGY : 



161 

nucleic acid 

double 

linear 

SEQ ID NO:46 



<xi) SEQUENCE DESCRIPTION: 

CGGCCGCTGC TGCTGCTGTG GCTCTGGGGG CGGCTCCGTG CCCTGACCCA 50 

AGGCACTCCG GCGGGGACCG CGCCGACCAA AGACGTGGTG GACTTGGAGT 100 

TTTACACCAA GAGGCTATTC CAAAGCGTGA GTCCCTCGTT CCTGTCCATC 150 

ACCATCGACG CCAGTCTGGC CACCGACCCT CGGTTCCTCA CCTTCCTGAG 200 

CTCTCCACGG CTTCGAGCCC TGTCTAGAGG CTTATCTCCT GCGTACTTGA 2 50 

GATTTGGCGG CACCAAGACT GACTTCCTTA TTTTTGATCC CAACAACGAA 300 

CCCACCTCTG AAGAAAGAAG TTACTGGCAA TCTCAAGACA ACAATGATAT 350 

TTGCGGGTCT GACCGGGTCT CCGCTGACGT GTTGA 385 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 





(A) 


LENGTH: 


541 








(B) 


TYPE: 


nucleic acid 






(C) 


STRANDEDNESS: double 








(D) 


TOPOLOGY : 


linear 






(xi> SEQUENCE DESCRIPTION: SEQ ID 


NO: 4 7 




AAATCAGGAC 


ATATCCTTCA 


CTTATTTGCC 


TCTTGGTCAT 


ATTGGAGGCA 


50 


TTTGTATTCA 


TTTTTAATAA 


CCCTCAAAAT 


AGTGCATGCA 


AAGTGCTAAG 


100 


CGTCATTTGC 


CACATGGTGC 


CATTAACTGT 


CACCACCTGC 


AGTGGTCTAC 


150 


TTAGAGAACA 


CCGCACTGGA 


TGTTAACACT 


GAAGCGCGTG 


CCCCGCCCTC 


200 


CCGAGGCTCT 


GGATCCAGCG 


TTGAAGCTTG 


CCCCGCCCTC 


CCGAGGCTCT 


250 


GGATCCAGCA 


CTGGAGCATG 


CCCCGCCCTC 


CCGAGGCTCT 


GGAGCTTGCT 


300 


AAGGAGTCCG 


CTCCCTACCG 


CTGGGGTTTT 


GCTTTATTCT 


TATGAATGAC 


350 


ACCCCTGACC 


GCTTTCGTCT 


CAGGGGTACT 


GTAATGCCTT 


TTATTTTCAT 


400 


ATACAAGCTG 


CGATTTTGGC 


ATTTCTTATG 


AC AAA A A AC C 


CATAGGAAAA 


450 


GGCGGGCACG 


CTTAGTGAGC 


TTCCTGCGGG 


GAGAGGTTTT 


TCTGTTAGAG 


500 


CTGGCANGGT 


CTGCTCATCG 


ACCATCTTCA 


GGCCTCGTGC 


C 
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